OpenAI fights NYT demand for 20 million private ChatGPT conversations

OpenAI has accused The New York Times of overreaching in its ongoing copyright lawsuit, warning that the newspaper’s latest legal demand seeks access to 20 million private ChatGPT user conversations, potentially compromising the privacy of millions of individuals.

OpenAI is asking the court to reject the request, citing user trust, data sensitivity, and lack of relevance to the core claims in the case.

The dispute is part of a broader legal battle initiated by The New York Times and other media outlets over alleged copyright infringement by OpenAI’s large language models. Filed under multi-district litigation MDL No. 25-md-3143 in the Southern District of New York, the lawsuit argues that OpenAI’s models were trained on copyrighted news content and may enable users to bypass paywalls or reproduce protected material. To support this claim, the Times has asked for extensive internal records, including user-generated conversations that may demonstrate such misuse.

The current flashpoint is a discovery request from the Times that targets a randomized sample of 20 million ChatGPT conversations collected between December 2022 and November 2024. The newspaper claims these chats may contain attempts by users to reproduce or access paywalled New York Times content, but OpenAI argues that the dataset includes highly personal user interactions unrelated to the lawsuit.

Initially, the Times sought access to 1.4 billion conversations and attempted to prevent users from deleting their chat histories, both of which were contested by OpenAI and narrowed through legal negotiation. The company has also offered the Times several privacy-preserving alternatives, such as limited, targeted searches or aggregate usage statistics. These offers were reportedly rejected.

The New York Times Company, a $10.35 billion media entity focused heavily on digital subscriptions and advertising, has staked much of its future on protecting premium content behind its paywalls. Its lawsuit against OpenAI is part of a broader strategy to assert control over how its content is accessed and reproduced in the age of generative AI.

Tensions escalated in June 2025 when US Magistrate Judge Ona T. Wang ordered OpenAI to indefinitely preserve all user-generated ChatGPT content, deleted or not, as part of the discovery process.

OpenAI has strongly objected to both the preservation order and the Times’ demand for sampled user chats. Chief Information Security Officer Dane Stuckey stated that turning over such a large volume of data, even de-identified, could jeopardize the privacy of people who have no connection to the litigation.

The company also emphasized the security risks of such data being handled by third-party legal teams and consultants. While any data shared would be governed by court protocols, OpenAI warns that the act of collecting and disclosing it sets a dangerous precedent.

To mitigate potential exposure, OpenAI says all affected conversations are being run through a de-identification process to scrub personal and sensitive information. The company is pushing for any review of the data to occur in a tightly controlled legal environment.

OpenAI has committed to accelerating its roadmap for privacy and security protections. Key upcoming features include client-side encryption for ChatGPT messages and automated systems for detecting misuse that limit human access to only the most serious safety concerns.

For users wondering if their data could be included, OpenAI clarified that only ChatGPT consumer chats from Dec. 2022 through Nov. 2024 are affected. Enterprise, Education, and API users employing ZDR (Zero Data Retention) settings are not impacted.

If you liked this article, be sure to follow us on X/Twitter and also LinkedIn for more exclusive content.