OpenAI’s Shocking Blunder: Key Evidence Vanishes in NY Times Lawsuit

Image: eWeek

Written By

Dec 3, 2024

2 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

OpenAI is under fire after engineers accidentally erased key data in a high-stakes lawsuit brought by The New York Times and Daily News. The lawsuit accuses OpenAI of using copyrighted articles to train its AI models without permission, potentially violating copyright law. According to court documents, the data deletion has forced the plaintiffs to redo weeks of work at a significant cost.

The erased data came from one of two virtual machines OpenAI had provided to the plaintiffs to search its AI training datasets for their copyrighted content. Virtual machines, often used for testing and backup purposes, allowed the plaintiffs’ legal team to comb through training data. However, on November 14, OpenAI engineers erased the contents of one machine, rendering the recovered data incomplete and unusable for tracing how the plaintiffs’ content was incorporated into OpenAI’s models.

Dispute Over Responsibility for Data Loss

OpenAI attributed the issue to a “system misconfiguration” requested by the plaintiffs, claiming the change inadvertently removed file names and folder structures on a temporary cache drive. OpenAI’s attorneys denied any files were permanently lost and maintained that the deletion was unintentional.

However, lawyers for the publishers argue the incident underscores OpenAI’s superior ability to search its own datasets. “The news plaintiffs have been forced to recreate their work from scratch using significant person-hours and computer processing time,” they wrote, adding that the loss delayed their case and increased costs.

Broader Implications for Copyright and AI

The lawsuit highlights the growing tension between content creators and AI developers. The New York Times and Daily News claim OpenAI’s use of their articles goes beyond “fair use,” arguing it provides an unfair advantage in developing commercial AI models. The plaintiffs are seeking billions in damages for allegedly using their works without authorization.

OpenAI, which has struck licensing deals with other major publishers like Axel Springer and Dotdash Meredith, has not disclosed whether its AI models were specifically trained on the plaintiffs’ content. OpenAI maintains that training models on publicly available data, including news articles, fall under fair use.

What’s Next?

As the case proceeds, the data loss could prove a significant hurdle for the plaintiffs. While OpenAI works to file its response, the broader legal battle over how AI companies use copyrighted content remains unresolved. With potential damages in the billions, the outcome could set a critical precedent for the future of AI development and intellectual property rights.

OpenAI’s Shocking Blunder: Key Evidence Vanishes in NY Times Lawsuit

Dispute Over Responsibility for Data Loss

Broader Implications for Copyright and AI

What’s Next?

Sunny Yadav

Company

Categories