A federal judge has ruled that Anthropic’s use of authors’ books to train its AI models falls under fair-use protection; however, the company still faces potential liability for allegedly sourcing those titles from pirate sites.
Monday’s split ruling by U.S. District Judge William Alsup is among the first to test how copyright law applies to large-scale AI training on protected works. Although Alsup agreed that training itself was lawful, he set a separate trial to determine how much Anthropic might owe for acquiring the books through unauthorized downloads.
Training AI on books is fair use, judge says
Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic, alleging the company fed pirated copies of their works into its AI models without permission.
Judge Alsup ultimately ruled that training AI models on copyrighted text is “exceedingly transformative” and therefore qualified as fair use under Section 107 of the Copyright Act.
In his decision, Alsup compared Anthropic’s AI training to how a human writer might read others’ work to develop their own ideas. “Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,” Alsup wrote.
He further clarified that the authors never showed Claude, Anthropic’s flagship model, reproduced “any infringing copy of their works” for users, further weakening the case for copyright infringement from the training process itself.
Pirated books remain in dispute
While the judge cleared the training-related complaint, the question of how Anthropic acquired the data is a different matter and will be decided in a future trial. Court filings say Anthropic harvested roughly seven million books from pirate sites, storing them in what it called a “central library.”
Alsup stated: “The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use.”
He continued, “That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages.”
While Anthropic argued that the source of the books should not affect fair use, Alsup rejected this reasoning. “This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites… was itself reasonably necessary to any subsequent fair use,” he wrote.
Because of this, the court has scheduled a trial to assess damages related to the unauthorized downloads and the creation of the pirated library. The company could face statutory damages of up to $150,000 per work, depending on the outcome.
AI companies score win, but uncertainty remains
Anthropic responded to the ruling with measured optimism. CNBC quoted a spokesperson saying it was “pleased” the court recognized its training practices as transformative. The spokesperson added that the decision was “consistent with copyright’s purpose in enabling creativity and fostering scientific progress.”
This case is one of many lawsuits filed by authors, artists, and media outlets against AI firms including OpenAI. Courts are being asked to decide whether mass scraping of content — especially from copyrighted or pirated sources — violates legal boundaries.
While Judge Alsup’s decision offers early legal clarity that AI training on copyrighted works can be protected under fair use, it stops short of providing a blanket shield. Companies that rely on unauthorized datasets or pirated material may still be held accountable.
Read eWeek’s coverage of Reddit’s lawsuit against Anthropic, as the platform pushes back against AI firms scraping user-generated content.