Meta has won a pivotal legal battle over training data for artificial intelligence models. A federal judge ruled that Meta’s use of copyrighted books to train its LLaMA models qualifies as fair use, to the dismay of the authors who sued the company.
However, US District Judge Vince Chhabria clarified that the ruling “does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful.” It only applies to these 13 authors, who lost because they did not provide “meaningful evidence” that Meta’s actions caused “market harm.”
This judgment comes just a few days after Anthropic scored a similar win in court, where a federal judge ruled that the company’s use of copyrighted books to train its AI models was protected under fair use.
Meta trained its AI using books from piracy site LibGen
The controversy began when unsealed court documents revealed that the company had used LibGen datasets to train its AI language models, including LLaMA 3. Established by Russian activists, LibGen is a notorious online repository offering free access to millions of books and academic papers, many of which are copyrighted.
Internal communications indicated that Meta employees were aware of the pirated nature of these materials. Although one employee expressed ethical concerns, CEO Mark Zuckerberg reportedly approved their use.
In response to these revelations, a group of American authors — including Sarah Silverman, Ta-Nehisi Coates, and Richard Kadrey — filed a lawsuit against Meta, alleging copyright infringement. The plaintiffs argue that Meta’s use of pirated books to train its AI models, including 666 of their own, violates their intellectual property rights and undermines their livelihoods. The lawsuit sought damages and an injunction to prevent Meta from further using unauthorised materials.
Court sides with Meta on fair use grounds
According to the summary judgment decision filed on Wednesday, the court found that Meta’s use of the authors’ books was “highly transformative,” meaning they were converted into data to train AI rather than to read or distribute. The copying of full works was also “reasonable,” given that the purpose was to train the AI to be as useful as possible.
Therefore, while it is “generally illegal to copy protected works without permission,” Chhabria said that the fair use doctrine protected Meta’s actions.
“We appreciate today’s decision from the Court,” a Meta spokesperson told TechRepublic in an email. “Open-source AI models are powering transformative innovations, productivity and creativity for individuals and companies, and fair use of copyright material is a vital legal framework for building this transformative technology.”
Attorneys representing the plaintiffs did not respond to CNBC’s request for comment.
Judge said Meta’s logic was ‘nonsense’ and that future creatives may win with stronger evidence
While Meta ultimately came out on top, the judge did pick some holes in the company’s defence. He described Meta’s claim that “public interest” would be “badly disserved” if it were prevented from using copyrighted text to train its models for free as “nonsense.”
Such a ruling wouldn’t prevent Meta from training its models on copyrighted works altogether; it would simply require the company to obtain permission, potentially negotiating licensing fees directly with authors.
“It may be that LLM companies move somewhat more slowly or make somewhat less money,” Chhabria said. “But the suggestion that the growth of LLM technology would come to a halt (or anything close) doesn’t pass the straight face test.”
He also said that Meta’s win “may be in significant tension with reality,” acknowledging that future plaintiffs could succeed if they present stronger evidence.
“No matter how transformative LLM training may be, it’s hard to imagine that it can be fair use to use copyrighted books to develop a tool to make billions or trillions of dollars while enabling the creation of a potentially endless stream of competing works that could significantly harm the market for those books,” Chhabria added.
A claim against Meta is still pending
One issue still pending is whether Meta unlawfully distributed copyrighted works by reuploading them during the torrenting process. A case management hearing to address this claim is scheduled for July 11. Anthropic also faces potential liability for allegedly downloading millions of books from pirate sites to create a “central library.”
For more on how creators are pushing back against AI, read about artists’ open letter calling for stronger copyright protections.