Report Alleges that Zuckerberg Approved Theft of Copyrighted Work to Train Meta’s AI

Image: eWeek

Écrit par

Jan 13, 2025

2 minute read

eWeek Le contenu et les recommandations de produits sont indépendants de la rédaction. Nous pouvons gagner de l'argent lorsque vous cliquez sur des liens vers nos partenaires. En savoir plus

Meta, the parent company of Facebook, Instagram, and WhatsApp, is facing a legal challenge from a group of writers who claim it used illegal copies to train its AI models. Author Ta-Nehisi Coates and comedian Sarah Silverman are among the writers who filed the complaint, which alleges that Meta purposefully trained its AI language model, LLaMA, using the LibGen dataset—a repository purportedly based in Russia and frequently criticized for having pirated content.

Internal Ethics Debate Preceded Decision

According to internal Meta messages cited in the petition, the company’s AI leadership team had expressed concerns about the use of LibGen and warned that integrating it into the model’s training data could harm Meta’s standing with authorities. The messages show an internal company struggle with the ethical and practical consequences of accessing the LibGen dataset from a Meta computer, despite the team’s eagerness to proceed with the data.

The concept of “torrenting,” the peer-to-peer file-sharing technique LibGen uses to increase the volume of content it illegally copies, was specifically pointed out in one message as a source of unease. However, a memo in the documents allegedly referring to Mark Zuckerberg by his initials noted that Meta’s AI team “has been cleared to employ LibGen.”

Complaint Still Faces Legal Challenges

Though the original complaint was filed in 2023, the incident is back in the news after U.S. district judge Vince Chhabria permitted the authors to file an amended complaint last week, reviving their claims of copyright infringement and adding a new computer fraud allegation. Although he initially dismissed the claims, the new evidence might be sufficient to turn the case around.

“Meta’s CEO, Mark Zuckerberg, approved Meta’s use of the LibGen dataset notwithstanding concerns within Meta’s AI executive team (and others at Meta) that LibGen is ‘a dataset we know to be pirated,’” lawyers for the plaintiffs confirmed, but requests for comments from Meta went unanswered.

The use of copyrighted resources to train AI models has generated controversy in the tech and creative sectors, with creators claiming that unlawful use of their work jeopardizes their revenue and intellectual property. Last year a federal court in New York ordered LibGen’s anonymous operators to pay $30 million in damages for copyright infringement. The case is part of a larger, ongoing conversation about the role of ethics in AI.

Read our guide to the ethical challenges facing generative AI tools like ChatGPT to learn more about the issues at stake.

Report Alleges that Zuckerberg Approved Theft of Copyrighted Work to Train Meta’s AI

Internal Ethics Debate Preceded Decision

Complaint Still Faces Legal Challenges

Kolawole Samuel Adebayo

Entreprise

Catégories