Reddit Sues Perplexity for Data Scraping | eWeek

Reddit Sues Perplexity for Scraping Its Data

A judge bringing order in the court by smashing the gavel.

Image: Enavto

Written By
David Curry
David Curry
Oct 24, 2025
3 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Social media platform Reddit filed a copyright lawsuit against artificial intelligence startup Perplexity, accusing it of illegally scraping posts and comments from millions of Reddit users.

In the lawsuit, Reddit claims that Perplexity relies on content from its forums to power its generative AI model. Most major chatbots have scraped Reddit in some form, given the platform’s vast library of niche discussions and conversational data across thousands of communities.

It is not the first lawsuit Reddit has filed against an AI company. In June, it launched a similar case against Anthropic, which is still ongoing. 

“AI companies are locked in an arms race for quality human content, and that pressure has fuelled an industrial-scale data laundering economy,” said Ben Lee, Reddit’s chief legal officer. “Reddit is a prime target because it’s one of the largest and most dynamic collections of human conversation ever created.”

Reddit also sued three companies that provide scraping services to clients, including AI developers. Those named are Lithuania-based Oxylabs, Russia-based AWMProxy, and Texas-based SerpApi, which lists Perplexity as a customer on its website.

Reddit’s stance appears justified, as several AI firms have already signed copyright licensing agreements with the platform. These include Google and OpenAI, which reached deals in February and May 2024 to access Reddit data for their AI models.

However, as Perplexity noted in a statement posted on Reddit, it is unclear whether those companies will continue paying for data licenses now that Reddit content from 2005 to 2024 has already been used to train their models.

“Why sue Perplexity? Our guess: it’s about a show of force in Reddit’s training data negotiations with Google and OpenAI,” the company said. “Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. A year ago, after explaining this, Reddit insisted we pay anyway, despite lawfully accessing Reddit data. Bowing to strong-arm tactics just isn’t how we do business.”

It wouldn’t be the first time Perplexity has been accused of underhanded tactics to scrape data. Web security firm Cloudflare accused the startup of using stealth, undeclared scraping tools to evade websites with no-crawl policies in August.

With the lawsuits against Perplexity and Anthropic, Reddit could help set a precedent for how AI companies source data from the web to train their models. In Perplexity’s case, which operates more as a search engine than a chatbot like ChatGPT or Gemini, the question may be whether such services can provide links, snippets, and content overviews from sites like Reddit without the content owner’s prior consent.

Until the rise of generative AI, Reddit didn’t appear to fully recognize the value of its user-generated content. In recent years, however, it has worked to make data licensing a key part of its business alongside advertising. To this end, it blocked The Internet Archive, creators of the Wayback Machine, from archiving its content in August. 

It remains an interesting test case for content ownership and distribution, particularly because, unlike Hollywood, the RIAA, or news publishers that have sued AI companies, almost all of Reddit’s content comes from its users.

Research on AI and content shows that as of November 2024, 50.3% of new web articles were generated primarily by AI. However, before the dawn of ChatGPT, that number was just 5%. 

David Curry

David is a tech journalist and analyst with over a decade’s experience writing for established outlets. He has covered the full spectrum of the tech landscape—mobiles, apps, AI, and everything in-between—delivering news, features, and data-led stories.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.