OpenAI may soon have to explain why it removed two disputed collections of books, and that explanation could be crucial in a class action by authors. The authors claim ChatGPT was trained on their works without permission, and the deleted files could affect the court’s view of the case.
It is agreed that OpenAI removed the sets called “Books 1” and “Books 2” before ChatGPT launched in 2022. Those collections were built in 2021 by former employees using material scraped from the public web and large portions taken from a shadow library known as Library Genesis or LibGen. OpenAI says the files were not used after that year and were deleted for that reason.
Authors, however, say OpenAI first said the deletion was due to “non-use,” then backtracked and tried to treat any reasons for deletion as confidential under attorney-client privilege. The authors see that as a sign OpenAI may be hiding why the files were removed, especially after a court allowed the authors to seek internal messages about the claimed “non-use.”
Last week, US magistrate judge Ona Wang ordered OpenAI to hand over all communications with in-house lawyers about deleting the datasets. She also ordered disclosure of “all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.”
Judge Wang pointed out that OpenAI argued both that “non-use” was not a reason for deletion and that it was a reason protected by privilege. She rejected that approach and said OpenAI cannot first state a “reason” and then claim the same “reason” is privileged to block discovery.
“OpenAI has gone back-and-forth on whether ‘non-use’ as a ‘reason’ for the deletion of Books1 and Books2 is privileged at all,” Wang wrote. “OpenAI cannot state a ‘reason’ (which implies it is not privileged) and then later assert that the ‘reason’ is privileged to avoid discovery.”
She also said OpenAI’s claim that every reason for deletion is privileged “strains credulity.” The judge ordered OpenAI to produce a wide range of internal messages by December 8 and to make its in-house lawyers available for deposition by December 19.
OpenAI has denied flipping its position. The company says unclear wording caused confusion over which deletion reasons were meant to be privileged. Judge Wang did not accept that defense, finding that OpenAI had effectively waived privilege by changing its statements over time.
The court’s orders could force OpenAI to reveal internal discussions about the datasets and about LibGen. If the authors can show the files were used or that key facts were hidden, the deleted datasets may weigh heavily in the outcome of the lawsuit.
Leave a comment