The law is largely (entirely?) reactionary, and particularly so when it comes to new technology. We are watching this play out in the artificial intelligence (AI) context, specifically as it relates to copyright law. There are several copycat cases over OpenAI's use of copyrighted material in the training of its large language models (LLMs), which operate the ChatGPT software. Copyright owners are suing because according to them, their works were copied in the training process and are reproduced (either wholesale or in part) in response to ChatGPT user prompts, and they haven't been paid for either. OpenAI has moved to dismiss these cases for, among other things, failure to state a claim, and has publicly taken the position that its use is "fair use"--an affirmative defense to copyright infringement. The outcomes of these chatbot battles could have serious business and legal consequences. Here's an update on what's going on in the ever-evolving chatbot copyright litigation.
Overview
Generally, these cases are focused on the ChatGPT inputs and outputs.
Infringement pt. 1: The Inputs. The plaintiffs generally allege that OpenAI committed copyright infringement by copying plaintiffs' copyrighted work in training the models. The models are created by "copying massive amounts of text from various sources and feeding these copies into the model." This is called the "training dataset." From there, the models are trained on the training dataset to extract expressive information. Over time, through this training, the model learns how to respond to prompts. The copying that occurs in the training is one instance of the alleged infringement.
Infringement pt. 2: The Outputs. The plaintiffs generally allege that ChatGPT commits copyright infringement when it responds to user prompts. In some instances, ChatGPT either generates a direct copy of the work (e.g. a verbatim reproduction of the copyrighted work) or a derivative work (e.g. a summary with copyrighted portions of the work)--both of which could be considered infringement. In the New York Times case, discussed below, the complaint contains illustrations of ChatGPT responding to a user prompt by generating near-verbatim copies of NYT articles.
With this understanding as background, we turn to the lawsuits.
Tremblay/Silverman v. ChatGPT
California saw the first lawsuits alleging copyright infringement against OpenAI in cases styled Tremblay et al v. OpenAI, Inc. et al, 23-cv-3223 and Silverman et al v. OpenAI, Inc. et al, 23-cv-3416. Plaintiffs are authors who accuse OpenAI of infringing their copyrighted works to train the LLMs that operate ChatGPT. Plaintiffs filed these cases as putative class actions, seeking to represent a class of all people in the U.S. who own copyrights in works that OpenAI used to train its models. The complaints allege six causes of action: 1) direct copyright infringement; 2) vicarious copyright infringement; 3) violation of the Digital Millennium Copyright Act (DMCA); 4) unfair competition; 5) negligence; and 6) unjust enrichment. OpenAI moved to dismiss Counts 2 through 6 (note that OpenAI did not seek dismissal of the claim for direct copyright infringement, which focuses on the alleged copying involving in the training of the LLMs). Two weeks ago, the district court in California largely sided with OpenAI in granting its motion to dismiss. Here's how the Court arrived at its decision.
1) Vicarious Copyright Infringement
Under the Copyright Act, the copyright holder has the exclusive right to (1) “reproduce the copyrighted work in copies,” (2) “prepare derivate works,” and (3) “distribute copies . . . of the copyrighted work to the public.” 17 U.S.C. § 106(1)-(3). Copyright protection does not, however, extend to any idea, concept, or process underlying the work. 17 U.S.C. § 102(b). To prevail on a claim for direct copyright infringement, a plaintiff must show that 1) they own a copyright and that 2) the defendant copied the work. The second prong requires two showings: copying and unlawful appropriation. Copying can be shown either through direct evidence of copying or by showing the defendant had access to the copyrighted work and the two works have similarities that are probative of copying. See Skidmore v. Zeppelin, 952 F.3d 1051, 1064 (9th Cir. 2020) (en banc). Unlawful appropriation requires the works to share substantial similarities. Id. For vicarious liability, in addition to showing direct copyright infringement, a plaintiff must prove that “the defendant has (1) the right and ability to supervise the infringing conduct and (2) a direct financial interest in the infringing activity.” Perfect 10, Inc. v. Giganews, Inc., 847 F.3d 657, 673 (9th Cir. 2017) (citation omitted).
Defendants argued that Plaintiffs failed to allege that direct infringement occurred with respect to the ChatGPT outputs (as opposed to the inputs at play in Count I), and the Court agreed. Plaintiffs alleged that "every output of the OpenAI Language Models is an infringing derivate work" and that "every output from the OpenAI Language Models constitutes an act of vicarious copyright infringement." The Court found these conclusory allegations were insufficient: "Plaintiffs here have not alleged that the ChatGPT outputs contain direct copies of the copyrighted books. Because they fail to allege direct copying, they must show a substantial similarity between the outputs and the copyrighted materials. See Skidmore, 952 F.3d at 1064; Corbello, 974 F.3d at 973-74. Plaintiffs’ allegation that 'every output of the OpenAI Language Models is an infringing derivative work' is insufficient. Tremblay Compl. ¶ 59; Silverman Compl. ¶ 60. Plaintiffs fail to explain what the outputs entail or allege that any particular output is substantially similar – or similar at all – to their books." Op. at 5. The Court dismissed the claim for vicarious copyright infringement with leave to amend.
2) Section 1202(b) of the DMCA
In addition to prohibiting infringement, copyright law prohibits the removal or alteration of what is known as "copyright management information." CMI is "information such as the title, the author, the copyright owner, the terms and conditions for use of the work, and other identifying information set forth in a copyright notice or conveyed in connection with the work.” Stevens v. Corelogic, Inc., 899 F.3d 666, 671 (9th Cir. 2018). The Digital Millennium Copyright Act (DMCA) states that no personal shall (without the approval of the copyright owner) (1) “intentionally remove or alter any” CMI, (2) “distribute . . . [CMI] knowing that the [CMI] has been removed or altered,” or (3) “distribute . . . copies of works . . . knowing that [CMI] has been removed or altered.” 17 U.S.C. § 1202(b). Section 1202(b) requires knowledge or “reasonable grounds to know” that the CMI removal would “induce, enable, facilitate, or conceal an infringement.” Id.
Defendants argued that Plaintiffs failed to allege that "OpenAI intentionally removed CMI during the training process or intended to conceal or induce infringement." Op. at 6. The Court agreed. Plaintiffs' allegation that "by design," Defendants removed CMI during the training process failed to allege sufficient facts supporting their DMCA claim. The Court noted that, to the contrary, the complaints contained examples of ChatGPT outputs that referenced Plaintiffs’ names, leading to the inference that the CMI was not entirely deleted. The Court noted that even if Plaintiffs had pled additional facts showing Defendants' intentional removal of CMI during the training process, "Plaintiffs have not shown how omitting CMI in the copies used in the training set gave Defendants reasonable grounds to know that ChatGPT’s output would induce, enable, facilitate, or conceal infringement." Op. at 7. The Court dismissed the DMCA claim.
3) Remaining Common Law Counts
The Court allowed a portion of Plaintiffs' unfair competition law claim to go forward, noting that "Assuming the truth of Plaintiffs’ allegations - that Defendants used Plaintiffs’ copyrighted works to train their language models for commercial profit - the Court concludes that Defendants’ conduct may constitute an unfair practice." Op. at 10. The Court dismissed Plaintiffs' negligence claim on the basis that Plaintiffs failed to allege that Defendants owed them a legal duty. Id. at 11. The Court also dismissed Plaintiffs' unjust enrichment claim on the basis that Plaintiffs failed to allege that Defendants obtained a benefit through "fraud, mistake, coercion, or request." Id. at 12.
Plaintiffs have until March 13, 2024, to file their amended complaint. The California cases have been consolidated into one case moving forward.
Authors Guild/NYT v. ChatGPT & Microsoft
On the other coast, ChatGPT and Microsoft are defending similar lawsuits, including one filed by the Authors Guild and The New York Times.
The New York Times complaint lets you know exactly why OpenAI is such a concern: in some instances, ChatGPT will actually spit out near-verbatim copies of The Times' works. Look at paragraphs 99, 100, 104, and 106 of the Complaint. ChatGPT allows its users to circumvent the NYT paywall. While charging its own user subscription fees, ChatGPT reaps the benefits of using the NYT content without paying NYT. The Times is also concerned over "hallucinations"-- AI's misattribution of content to The Times.
The NYT complaint alleges seven claims: Copyright Infringement, Vicarious Copyright Infringement, Contributory Copyright Infringement (two counts), Violation of the DMCA, Unfair Competition by Misappropriation, and Trademark Dilution. Publicly, OpenAI responded to Microsoft's lawsuit by stating that it tried to negotiate a deal with The Times, that it seeks to avoid a "rare bug" where the model simply "regurgitates" material, and that OpenAI's use would be considered "fair use" under the law. According to The Times' lawyer, OpenAI's response conceded that OpenAI copied The Times' works without payment or permission. Two days ago, OpenAI moved to dismiss on statute of limitations, failure to state a claim, and preemption.
CA Plaintiffs Move to Intervene in NY
The same day that the California District Court issued its order on OpenAI's motion to dismiss, the California Plaintiffs moved to intervene in and dismiss, or alternatively stay or transfer, the New York cases against OpenAI and Microsoft. As asserted in the motion, Tremblay was the first-filed lawsuit alleging that OpenAI committed copyright infringement. After that, there were a number of copycat cases filed, including the ones filed by Authors Guild and the NYT. On that basis, and for other reasons, the motion asks the New York District Court to allow intervention and to dismiss or alternatively stay or transfer the New York actions to California under the first-filed doctrine.
Microsoft has conditionally opposed the motion to intervene unless all Plaintiffs (NY and CA) agree to dismiss it with prejudice (good luck!). OpenAI filed a "position statement" letting the Court know that it took no position on the motion.
Comments