Posted in

Courts must not let copyright law strangle AI

The Trump administration has unveiled an ambitious blueprint for U.S. dominance of artificial intelligence, outlining actions to accelerate innovation, build infrastructure and lead internationally. The plan envisions AI ushering in “an industrial revolution, an information revolution, and a renaissance — all at once,” positioning AI development as a national security imperative for maintaining America’s technological leadership.

Yet none of these carefully crafted policies can succeed if restrictive copyright interpretations strangle AI development in its infancy. The administration’s vision of American AI setting “global standards” and powering a “new golden age of human flourishing” depends entirely on companies being able to train AI systems on vast datasets.

If courts adopt restrictive copyright interpretations that prevent AI from training on existing texts, America’s comprehensive AI leadership strategy will be impossible to execute. This would effectively cede technological dominance to nations with fewer legal constraints.

To properly apply copyright law to this technology, courts must first distinguish between AI training and content generation — two fundamentally different processes with distinct legal implications.

AI training involves analyzing vast text datasets to identify statistical patterns in language, converting words into mathematical representations that create statistical models. This process does not store original texts any more than human readers memorize every word they’ve read.

Content generation occurs when users interact with trained models through prompts. This user-directed process is entirely separate from training, and AI companies clearly receive fair use protection against contributory copyright infringement lawsuits under the “substantial non-infringing use” doctrine given how these models are overwhelmingly used for content and tasks that in no way run afoul of copyright protections.

The framework for determining what constitutes fair use for AI training is more complex, necessitating a much deeper analysis of the four statutory factors, none of which is singularly determinative.

Courts first must look at the purpose and character of the copyrighted work’s use, with more transformative uses being more likely to receive fair use protection. Clearly, AI training is inherently and radically transformative, as it involves converting expressive text into mathematical models that serve an entirely different purpose than the reading of those works for their creative content. With how transformative AI training is of copyrighted content, the commercial nature of its ultimate intended use is likely of no consequence under Supreme Court precedent.

The second factor to consider is the nature of the copyrighted work, with creative works receiving the most protection. While AI training datasets do indeed pull from creative works, they do so primarily to extract unprotectable elements such as facts, ideas and functional aspects of expression from the work.

The third factor examines the amount of a copyrighted work that is used. Although AI training absorbs complete works, this is necessary for legitimate technological purposes, as AI systems need complete texts to effectively understand context and linguistic relationships. Courts have recognized this level of use as legitimate in what were inarguably less transformative cases, making it clear that precedent on this factor again strongly supports a finding of fair use.

The final factor is the effect the use has on the potential market. Specifically, the Supreme Court has emphasized that market harm must stem from the use serving as a substitute for the original work. Problematic market harm occurs “when a commercial use amounts to mere duplication of the entirety of an original” that “supersedes the objects” of the original work.

To that end, recall that AI training doesn’t create “copies” in any meaningful sense — it transforms text into mathematical weights and parameters that are incomprehensible to humans and cannot be “read” to access original expression. Since nobody reads statistical models instead of purchasing books or newspapers, there is no true market substitution created in the training of AI models.

Perhaps most importantly, empirical evidence showing market harm from AI deployment is severely lacking. For example, despite AI systems being widely available for years now, book sales are rising. If AI truly functioned as a market substitute, we would surely see measurable economic damage to the creative industry that would be most easily displaced by a large language model.

The history of technological innovation is filled with examples of copyright holders initially opposing new technologies only to later benefit enormously from them. We saw movie studios fight VCRs tooth and nail and the music industry rage against MP3s. In both cases, the technology in question ultimately created vastly larger markets for those industries that had been fighting the future rather than seeking to adapt to it.

AI is poised to follow this same pattern. Rather than threatening creative industries, it will likely create new markets and revenue streams that we can barely imagine today.

Nicholas Creel is an associate professor of business law at Georgia College and State University. The views expressed here do not necessarily reflect those of his employer or any other organization.