Artificial Intelligence on Pirated Books: Here's How MetaAI Trained Itself

The Greatest Art Theft in Human History, or How Meta Trained Its AI on Pirated Books. Legal Options Were Called “Unreasonably Expensive” and an “Extremely Slow” Process

Artificial Intelligence on Pirated Books: Here's How MetaAI Trained Itself

photo: Dado Ruvic / / FORUM

Artificial intelligence has been talked about for years. Although it is only today that we have the opportunity to interact with it more and more often. However, the need for huge amounts of high-quality texts turned out to be a big problem for training artificial intelligence.

Importantly, books are one of the most easily accessible sources of text. Unlike sites containing content optimized for search engines (SEO), they are a source of high quality. This is important because AI feeding on AI lowers its own quality , and after all, a lot of content on the web has already been generated by AI. Therefore, books from before 2020 provide a guarantee that they do not contain content created using it.

As revealed by The Atlantic, Mark Zuckerberg, CEO of Meta, approved the use of LibGen (short for “Library Genesis”), an illegal file repository that was the source of content used to train Meta's AI model. LibGen was created around 2008 in Russia, a system that contains over 7.5 million files with books and 81 million files with research papers. It is one of the largest online pirate libraries in the world, but useful for many students (and, as it turns out, not only them).

According to The Atlantic, Meta employees have talked to a number of companies about licensing the books for use in AI development, but the results have been disappointing. “It seems unreasonably expensive to me,” one employee wrote in an internal company chat. A senior manager on the Llama 3 team added that it would also be an “extremely slow” process.

After the case came to light, author communities from all over the world want to sue Meta. French publishers and authors announced such a plan in March 2025. Vincent Montagne, president of the National Association of Publishers, accused Meta of “non-respect of copyright and parasitism” during a press conference. In Poland, the Literary Union (an association defending authors' rights)
also encourages authors to check The Atlantic's database.

– We are dealing with the greatest theft of works in the history of mankind. Theft that governments allow. Even the European Union, despite the efforts of organizations associating creators, such as the European Writers Council, is reacting below expectations. The power of the so-called big tech is overwhelming for now. And yet piracy is theft that should be punished – says Grażyna Plebanek, a writer involved in the activities of the Literary Union, quoted by “Gazeta Wyborcza”.

Prepared by JM

Sourse

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *