Related ToolsChatgpt

Major Publishers Sue Meta Over Llama AI Training Data

Meta Llama
Image: Meta

Five major publishers - Macmillan, McGraw-Hill, Elsevier, Hachette, and Cengage - have filed a class action lawsuit against Meta, alleging the company used millions of copyrighted books to train its Llama AI models without permission or payment. One author has joined the suit. The plaintiffs describe the alleged conduct as "one of the most massive infringements of copyrighted materials in history."

Meta's Llama models are the dominant open-source AI systems used across the industry. Training an AI model involves feeding it enormous amounts of text so it can learn language patterns and generate coherent responses. The publishers' claim is that their books were included in that training data without authorization.

This case stands out from other AI copyright suits because of who's filing. These aren't individual authors acting on principle. Macmillan, McGraw-Hill, Elsevier, Hachette, and Cengage collectively represent a major portion of global book publishing, and they have the legal budget and resources to pursue this through a full trial.

Books also present a harder target for fair use arguments than scraped web text. Each book is a distinct, individually authored work with a named copyright holder - making it far easier to demonstrate which specific works were taken and who owns them. The New York Times filed similar claims against OpenAI, maker of ChatGPT, though that case remains unresolved. If Meta's defense relies on fair use doctrine - the legal provision that allows limited use of copyrighted material under certain conditions - it will need to convince a court that building a commercial AI model on millions of copyrighted books qualifies. No court has definitively ruled on that question yet.

A ruling against Meta would reach well beyond this lawsuit. Every major AI lab would need to rethink where its training data comes from.