Reddit Sues Perplexity for Data Scraping | eWeek

Reddit Sues Perplexity for Scraping Its Data

A judge bringing order in the court by smashing the gavel.

Image: Enavto

Écrit par
David Curry
David Curry
Oct 24, 2025
3 minute read
eWeek Le contenu et les recommandations de produits sont indépendants de la rédaction. Nous pouvons gagner de l'argent lorsque vous cliquez sur des liens vers nos partenaires. En savoir plus

Social media platform Reddit filed a copyright lawsuit against artificial intelligence startup Perplexity, accusing it of illegally scraping posts and comments from millions of Reddit users.

In the lawsuit, Reddit claims that Perplexity relies on content from its forums to power its generative AI model. Most major chatbots have scraped Reddit in some form, given the platform’s vast library of niche discussions and conversational data across thousands of communities.

It is not the first lawsuit Reddit has filed against an AI company. In June, it launched a similar case against Anthropic, which is still ongoing. 

“AI companies are locked in an arms race for quality human content, and that pressure has fuelled an industrial-scale data laundering economy,” said Ben Lee, Reddit’s chief legal officer. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

Reddit also sued three companies that provide scraping services to clients, including AI developers. Those named are Lithuania-based Oxylabs, Russia-based AWMProxy, and Texas-based SerpApi, which lists Perplexity as a customer on its website.

Reddit’s stance appears justified, as several AI firms have already signed copyright licensing agreements with the platform. These include Google and OpenAI, which reached deals in February and May 2024 to access Reddit data for their AI models.

However, as Perplexity noted in a statement posted on Reddit, it is unclear whether those companies will continue paying for data licenses now that Reddit content from 2005 to 2024 has already been used to train their models.

“Why sue Perplexity? Our guess: it’s about a show of force in Reddit’s training data negotiations with Google and OpenAI,” the company said. “Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong-arm tactics just isn’t how we do business.”

It wouldn’t be the first time Perplexity has been accused of underhanded tactics to scrape data. Web security firm Cloudflare accused the startup of using stealth, undeclared scraping tools to evade websites with no-crawl policies in August.

With the lawsuits against Perplexity and Anthropic, Reddit could help set a precedent for how AI companies source data from the web to train their models. In Perplexity’s case, which operates more as a search engine than a chatbot like ChatGPT or Gemini, the question may be whether such services can provide links, snippets, and content overviews from sites like Reddit without the content owner’s prior consent.

Until the rise of generative AI, Reddit didn’t appear to fully recognize the value of its user-generated content. In recent years, however, it has worked to make data licensing a key part of its business alongside advertising. To this end, it blocked The Internet Archive, creators of the Wayback Machine, from archiving its content in August. 

It remains an interesting test case for content ownership and distribution, particularly because, unlike Hollywood, the RIAA, or news publishers that have sued AI companies, almost all of Reddit’s content comes from its users.

Research on AI and content shows that as of November 2024, 50.3% of new web articles were generated primarily by AI. However, before the dawn of ChatGPT, that number was just 5%. 

David Curry

David Curry is a tech journalist and analyst with over a decade of experience writing for established outlets. He holds a master’s degree in International Journalism from the University of Leeds and has covered the technology sector since the early 2010s. His work focuses on B2B technology, data journalism, mobile apps and app markets, artificial intelligence, digital platforms, and emerging technologies. He earned a BA from the University of Lincoln and an MA from the University of Leeds.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Propriété de TechnologyAdvice. © 2026 TechnologyAdvice. Tous droits réservés

Divulgation publicitaire : Certains des produits qui apparaissent sur ce site proviennent d'entreprises dont TechnologyAdvice reçoit une compensation. Cette compensation peut influencer la façon dont les produits apparaissent sur ce site, notamment l'ordre dans lequel ils apparaissent. TechnologyAdvice n'inclut pas toutes les entreprises ou tous les types de produits disponibles sur le marché.