HarperCollins Book Catalog To Train Microsoft AI Models For The Next 3 Years | eWeek

HarperCollins Book Catalog To Train Microsoft AI Models For The Next 3 Years

Photo of Harper Collins.
Written By
Kara Sherrer
Kara Sherrer
Dec 7, 2024
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Microsoft signed a three-year deal with Harper Collins to train an as-yet-unnamed AI model on the major publisher’s catalog. According to Bloomberg, the terms of the deal offered $5,000 per nonfiction book, split evenly between the author and HarperCollins. The deal is separate from other publishing agreements and is not counted against existing advances. In addition, the deal only applies to select nonfiction books that were previously published, not fiction books.

404 Media broke the news but did not reveal the name of the tech company involved. Bloomberg published a follow-up article with more details, including the fact that Microsoft is developing the AI model.

Terms of the Microsoft-HarperCollins AI Deal

HarperCollins authors must opt into the AI training program and allow their nonfiction books to be used. Authors who decline the offer will not have their books included in the training dataset and will not receive the payout. Not all HarperCollins authors will be offered the deal. Microsoft is selecting the books it wants to include in the training set.

The deal allegedly includes terms meant to mitigate authors’ concerns about generative AI and how it might plagiarize content or reduce the demand for human writers. For instance, the deal states that “no more than 200 consecutive words and/or five percent of a book’s text” will be used in training the AI model. It also includes a pledge that Microsoft will not scrape text from illegal piracy websites.

Why the Microsoft-HarperCollins AI Deal Matters

Large learning models (LLMs) and other AI model types require vast datasets to train. Only a finite amount of content is available in the public domain. By purchasing access to HarperCollins’ nonfiction backlist, Microsoft is significantly increasing the pool of available data it can use to train its AI model.

While various tech companies have previously struck deals with publishers to train artificial intelligence models on past content, this is the first time that the specific terms of the deal have been made public. The HarperCollins deal gives a monetary benchmark of what Microsoft—and by extension, other AI companies—are willing to spend to train their models.

A source also told Bloomberg that Microsoft that the AI model will not be used to generate books. The purpose of the new Microsoft AI model has not yet been announced.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.