OpenAI Responses to New York Times Copyright Infringement Lawsuit

Ryan J. Farrick — January 10, 2024

The New York Times building in Manhattan. Image via Flickr/user:Ajay_Suresh. (CCA-BY-2.0). (source:https://www.flickr.com/photos/ajay_suresh/48193462432).

OpenAI recently shared a long blog post, in which it suggested that The New York Times had “intentionally manipulated” ChatGPT prompts to make it seem as if the A.I. program could replicate and regenerate copyright-protected content.

OpenAI has responded to a copyright infringement lawsuit filed against it by the New York Times, saying that the complaint is “without merit” and overlooks the opportunities artificial intelligence provides news organizations and media outlets.

As LegalReader.com has reported before, the New York Times recently filed claims against OpenAI—the company behind ChatGPT—and Microsoft, both of which the paper accuses of infringing on its copyrights by using “millions” of articles to train language models.

In its complaint, the Times alleged that—at least in some cases—media organizations are being forced to compete with chatbots as sources of reliable information.

However, programs like ChatGPT are language models, which dependent on vast amounts of “training” material. In creating ChatGPT, OpenAI used information sourced from across the internet, including copyright-protected works. By analyzing and adapting these works, ChatGPT can retrieve content and emulate distinctive writing styles.

In its complaint, attorneys for the New York Times state that ChatGPT can “generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style.”

But, on Monday, OpenAI began pushing back against these allegations. In a 1,000-word update posted its website, OpenAI said that the New York Times is “not telling the full story,” claiming that the Times “intentionally manipulated” prompts to make it appear as if ChatGPT can access and replicate copyright-protected content.

“Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts,” the post claims.

“Memorization is a rare failure of the learning process that we are continually making progress on, but it’s more common when particular content appears more than once in training data, like if pieces of it appear on lots of different public websites,” OpenAI wrote. “So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs. We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.”

OpenAI also indicated that, before it was served with a lawsuit, it had been engaged in constructive communications with the Times regarding copyright infringement and protection.

However, and perhaps somewhat tellingly, OpenAI’s statement suggests that the company’s relationship with the New York Times may have been sales-oriented—and, by extension, not necessarily centered on the paper’s copyright concerns.

“Our discussions with The New York Times had appeared to be progressing constructively through our last communication on December 19,” OpenAI said. “The negotiations focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting.”

OpenAI also stressed, in its communications, that New York Times “content [does not] meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training.”

“Their lawsuit on December 27—which we learned about by reading The New York Times—came as a surprise and disappointing to us,” OpenAI said.

“We regard The New York Times’ lawsuit to be without merit,” it added. “Still, we are hopeful for a constructive partnership with The New York Times and respect its long history, which includes reporting the first working neural network over 60 years ago and championing First Amendment freedoms.”