Reddit Sues Perplexity for Scraping Its Data

A judge bringing order in the court by smashing the gavel.

Image: Enavto

Écrit par

Oct 24, 2025

3 minute read

eWeek Le contenu et les recommandations de produits sont indépendants de la rédaction. Nous pouvons gagner de l'argent lorsque vous cliquez sur des liens vers nos partenaires. En savoir plus

Social media platform Reddit filed a copyright lawsuit against artificial intelligence startup Perplexity, accusing it of illegally scraping posts and comments from millions of Reddit users.

In the lawsuit, Reddit claims that Perplexity relies on content from its forums to power its generative AI model. Most major chatbots have scraped Reddit in some form, given the platform’s vast library of niche discussions and conversational data across thousands of communities.

It is not the first lawsuit Reddit has filed against an AI company. In June, it launched a similar case against Anthropic, which is still ongoing.

“AI companies are locked in an arms race for quality human content, and that pressure has fuelled an industrial-scale data laundering economy,” said Ben Lee, Reddit’s chief legal officer. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

Reddit also sued three companies that provide scraping services to clients, including AI developers. Those named are Lithuania-based Oxylabs, Russia-based AWMProxy, and Texas-based SerpApi, which lists Perplexity as a customer on its website.

Copyright agreements seen as source of income

Reddit’s stance appears justified, as several AI firms have already signed copyright licensing agreements with the platform. These include Google and OpenAI, which reached deals in February and May 2024 to access Reddit data for their AI models.

However, as Perplexity noted in a statement posted on Reddit, it is unclear whether those companies will continue paying for data licenses now that Reddit content from 2005 to 2024 has already been used to train their models.

“Why sue Perplexity? Our guess: it’s about a show of force in Reddit’s training data negotiations with Google and OpenAI,” the company said. “Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong-arm tactics just isn’t how we do business.”

It wouldn’t be the first time Perplexity has been accused of underhanded tactics to scrape data. Web security firm Cloudflare accused the startup of using stealth, undeclared scraping tools to evade websites with no-crawl policies in August.

With the lawsuits against Perplexity and Anthropic, Reddit could help set a precedent for how AI companies source data from the web to train their models. In Perplexity’s case, which operates more as a search engine than a chatbot like ChatGPT or Gemini, the question may be whether such services can provide links, snippets, and content overviews from sites like Reddit without the content owner’s prior consent.

Until the rise of generative AI, Reddit didn’t appear to fully recognize the value of its user-generated content. In recent years, however, it has worked to make data licensing a key part of its business alongside advertising. To this end, it blocked The Internet Archive, creators of the Wayback Machine, from archiving its content in August.

It remains an interesting test case for content ownership and distribution, particularly because, unlike Hollywood, the RIAA, or news publishers that have sued AI companies, almost all of Reddit’s content comes from its users.

Research on AI and content shows that as of November 2024, 50.3% of new web articles were generated primarily by AI. However, before the dawn of ChatGPT, that number was just 5%.

David Curry

David Curry is a tech journalist and analyst with over a decade of experience writing for established outlets. He holds a master’s degree in International Journalism from the University of Leeds and has covered the technology sector since the early 2010s. His work focuses on B2B technology, data journalism, mobile apps and app markets, artificial intelligence, digital platforms, and emerging technologies. He earned a BA from the University of Lincoln and an MA from the University of Leeds.

Reddit Sues Perplexity for Scraping Its Data

Copyright agreements seen as source of income

David Curry

Entreprise

Catégories