Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Latest News

      AI Controversy: O’Reilly Accuses OpenAI of Training AI on Paywalled Content

      Written by

      J.R. Johnivan
      Published April 7, 2025
      Share
      Facebook
      Twitter
      Linkedin
        OpenAI CEO Sam Altman.
        Image: James Tamim/Creative Commons

        eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

        Meta recently faced accusations of training its AI models on pirated content; now, OpenAI finds itself entangled in a similar controversy. A new study claims that one of OpenAI’s latest large language models (LLMs) was trained on non-public, copyrighted-projected material from O’Reilly Media. Specifically, authors of the study suggest that OpenAI’s development teams may have trained one of their most advanced models on restricted content without authorization.

        The study’s authors wrote, in part: “Although the evidence present here on model access violations is specific to OpenAI and O’Reilly Media books, this is likely a systematic issue.”

        Examining the accusations

        The study was written by a team with O’Reilly Media, including CEO Tim O’Reilly. It explicitly claims that OpenAI, one of today’s top AI companies, is training one of its most recent AI models on content that is locked behind a paywall through O’Reilly Media’s official channels.

        The authors of the study titled “Beyond Public Access in LLM Pre-Training Data” started with 34 copyrighted books from O’Reilly Media, including content that was publicly available and paywalled. Next, they applied the DE-COP membership inference attack method, which is a way of determining whether an AI model has already memorized a specific text, to investigate various types of AI models from OpenAI.

        The team also assigned an Area Under the Receiver Operating Characteristic (AUROC) score to each LLM. This score measures the likelihood that these AI models were trained using one or more of the 34 copyrighted books from O’Reilly Media.

        • GPT-4o: Demonstrates stronger recognition of non-public content from O’Reilly Media (AUROC score: 82%) than public content (AUROC score: 64%).
        • GPT-3.5 Turbo: Demonstrates slightly stronger recognition of public content from O’Reilly Media (AUROC score: 64%) than non-public (AUROC score: 54%).
        • GPT-4o Mini: No indication the model was trained on public or non-public content from O’Reilly Media.

        Reading the fine print

        While their study initially absolves GPT-4o Mini of any infringement, the study notes that this could be a result of the AI model’s smaller scale and its inability to remember as much text as GPT-4o and other generative AI tools. Their study also expresses some uncertainty surrounding the AUROC scores, noting that these are meant to be taken as estimates.

        The study concludes by suggesting that current AI training methods may soon lead to an “extractive dead end.” By failing to compensate the copyright owners and content creators, AI developers will ultimately see diminished content quality, accuracy, and diversity.

        J.R. Johnivan
        J.R. Johnivan
        J.R. Johnivan is a 17-year veteran whose writing is focused on innovation and technology, including IT, computer networking, security, cloud computing, staffing, human resources, real estate, sports, entertainment, and more.

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        MOST POPULAR ARTICLES

        Artificial Intelligence

        9 Best AI 3D Generators You Need...

        Sam Rinko - June 25, 2024 0
        AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
        Read more
        Cloud

        RingCentral Expands Its Collaboration Platform

        Zeus Kerravala - November 22, 2023 0
        RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
        Read more
        Artificial Intelligence

        8 Best AI Data Analytics Software &...

        Aminu Abdullahi - January 18, 2024 0
        Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
        Read more
        Latest News

        Zeus Kerravala on Networking: Multicloud, 5G, and...

        James Maguire - December 16, 2022 0
        I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
        Read more
        Video

        Datadog President Amit Agarwal on Trends in...

        James Maguire - November 11, 2022 0
        I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
        Read more
        Logo

        eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

        Facebook
        Linkedin
        RSS
        Twitter
        Youtube

        Advertisers

        Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

        Advertise with Us

        Menu

        • About eWeek
        • Subscribe to our Newsletter
        • Latest News

        Our Brands

        • Privacy Policy
        • Terms
        • About
        • Contact
        • Advertise
        • Sitemap
        • California – Do Not Sell My Information

        Property of TechnologyAdvice.
        © 2024 TechnologyAdvice. All Rights Reserved

        Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

        ×