OpenAI inks deal with Axel Springer on licensing news for model training

December 13, 2023 ndowd

Many, if not most, generative AI tech vendors argue that fair use entitles them to train AI models on copyrighted material scraped from the internet — even if they don’t get permission from the rightsholders. But some vendors, such as OpenAI, are hedging their bets — perhaps wary of the outcome of pending relevant lawsuits.

OpenAI today announced that it’s reached an agreement with Axel Springer, the Berlin-based owner of publications including Business Insider and Politico, to train its generative AI models on the publisher’s content and add recent Axel Springer-published articles to OpenAI’s viral AI-powered chatbot ChatGPT.

It’s OpenAI’s second such arrangement with a news organization after the startup said that it would license some of the The Associated Press’ archives for model training.

Going forward, ChatGPT users will get summaries of “selected” articles from Axel Springer’s publications — including stories normally gated behind a paywall. The snippets will be accompanied both by attribution and links to the full articles.

In return, Axel Springer will receive payments of an unspecified size and frequency from OpenAI. The deal is valid for several years, and — while it doesn’t commit either side to exclusivity — Axel Springer says that it’ll support the outlet’s existing AI-driven ventures “that build upon OpenAI’s technology.”

“We’re excited to have shaped this global partnership between Axel Springer and OpenAI — the first of its kind,” Axel Springer CEO Mathias Döpfner said in a canned statement. “We want to explore the opportunities of AI-empowered journalism — to bring quality, societal relevance and the business model of journalism to the next level.”

Outside of the publishers tapping generative AI for questionable content strategies, publishers and generative AI vendors have a testy relationship, with the former alleging copyright infringement and increasingly concerned about generative models cannibalizing traffic. For example, Google’s new generative AI-powered search experience, called SGE, has pushed links that appear in traditional search further down search results pages — potentially reducing traffic to those links by as much as 40%.

Publishers also object to vendors training their models on content without compensation agreements in place — particularly in light of reports that tech giants including Google are experimenting with AI tools to summarize news. According to one recent survey, hundreds of news organizations are now using code to prevent OpenAI, Google and others from scanning their websites for training data.

In August, several media organizations including Getty Images, The Associated Press, the National Press Photographers Association and The Authors Guild published an open letter calling for more transparency and copyright protection in AI. In the letter, the signatories urged policymakers to consider regulations that require transparency into training data sets and allow media companies to negotiate with AI model operators, among other suggestions.

“[Current] practices undermine the media industry’s core business models, which are predicated on readership and viewership (such as subscriptions), licensing, and advertising,” the letter reads. “In addition to violating copyright law, the resulting impact is to meaningfully reduce media diversity and undermine the financial viability of companies to invest in media coverage, further reducing the public’s access to high-quality and trustworthy information.”

source