The Copyright Issues in Generative AI Training; Can the Concept of Fair Use Suffice?

Introduction

It’s become second nature for most people to interact with Generative AI (GenAI) on a regular basis. These models create new content almost like magic with a little bit of context and the right prompts. What’s less talked about, though, is how these systems are actually trained and where the ideas for the content they generate really comes from.  There is evidence that systems are trained on materials that already exist, content created by humans and protected as products of the mind, thus heavily relying on existing copyrighted works. This has sparked both academic debate and ongoing legal disputes about whether such use amounts to an infringement of copyright. While some argue that the tendencies of training AI on copyrighted material can be availed under the concept of fair use or fair dealing, others are of the opinion that it remains an infringement on copyright holders’ rights, especially when the outputs generated by the AI closely resemble the original works. This has become the subject of litigation in jurisdictions like the United States. Beyond that, there’s also growing concern about the way AI has the tendency to mimic human creativity so closely which has begun to blur the lines between original and replicated work.

This paper considers the copyright infringement issues that are currently highlighted under AI training. It further argues that while fair use or fair dealing provides an initial analytical framework, it is insufficient on its own to address the complex copyright implications of GenAI training.

 

Understanding the Debate around Generative AI and Its Reliance on Copyrighted Works 

In simple terms, GenAI is a type of artificial intelligence that creates new content like text, images, music, or videos by learning from existing data.[1] Unlike traditional AI, which typically analyzes information or makes predictions, GenAI was designed to produce original outputs through deep machine learning on similar materials. These AI systems are trained on large amounts of data to recognize patterns and then generate content that resembles what it has learned. As a result of this, AI can write a new song or generate an image from scratch and it will be difficult to determine whether it was created or designed by a human or generated by an AI from the data it is fed.

While GenAI models are capable of creating original content, the training of these systems are usually based on datasets that are existing copyrighted works. More and more courts around the world are faced with lawsuits related to copyright and artificial intelligence. These cases generally fall into two categories: discussions around the ownership of the copyright in AI-generated work and the disputes over alleged copyright infringement by GenAI systems.

Who Owns the Copyright in AI-generated works?

Copyright generally protects original works of authorship and this would typically require human authorship in most jurisdictions including Nigeria. However, the question of authorship becomes complex when it comes to AI-generated content, as the AI itself is not a legal person. If AI is used to produce a copyrightable material,, who owns the copyright? the AI system, the user, the developer of the system, or no one at all?

There are two key positions on this, the most prevalent is that only works created by humans qualify for copyright protection.  A federal appeals court in the U.S denied copyright to a purely AI-generated artwork, stating that “works must be ‘authored in the first instance by a human being”.[2] Therefore, an artwork entirely created by AI regardless of how sophisticated the prompting was did not qualify for copyright protection. This is the stance in most jurisdictions and only actual human labor and creativity qualify under copyright law.

However, in jurisdictions like China, the stance is different. A Beijing Internet Court has ruled that AI-generated images can be protected under copyright when there’s substantial human intellectual involvement particularly when it comes to  refining the prompts. The author must  “demonstrate that they have exerted creative effort in their AI-generated creations, reflecting personalized expression”.[3] Thus, when a work is created entirely by AI without any human involvement, it does not qualify for copyright protection. As a result, no individual or entity can claim ownership, and the work automatically falls into the public domain.

Does Training on Copyrighted Work Count as Infringement?

The question whether training AI on copyrighted data counts as infringement is one of the biggest legal debates surrounding GenAI today. The growing wave of lawsuits against GenAI training companies border on alleged use of copyrighted works without proper authorization.

As GenAI becomes more and more sophisticated, the court is being asked to decide whether these systems cross the line into copyright infringement and whether massive data scraping and style replication are innovative or simply an infringement.

Under copyright law, the act of copying a substantial part of a work without authorization is infringement, even if the copy is not distributed to the public. Training GenAI often requires making copies of vast amounts of datasets so the model can “learn” from them in order to produce similar outputs. Since training necessarily involves making copies of works in order to feed them into the AI model some see this as an unauthorized reproduction, which negates the copyright owner’s exclusive right of reproduction. Also, AI outputs would often mimic the writer or author’s style which in a way reduces the market value of original works. The other argument is that training of AI on these works count as transformative and does not harm copyright owners rights. Some early court rulings have hinted that training on legally obtained works might be considered “transformative,” as defined under Section 107 of the US copyright Act and thus, it can be availed under the exception of fair use/fair dealing.

Although the use of pirated materials  and the generation of near-identical outputs keeps this a legally precarious debate. 

The Possibility of  Fair Use or Fair Dealing as a Defence

While GenAI models are trained on vast datasets of copyrighted material, the concept of fair use, which allows limited use of copyrighted works for acceptable purposes has been argued as a possible exception to the stance of copyright infringement in AI training.

Although fairly similar, there is a difference between Fair Use and Fair Dealing.

Fair Use is mainly recognized in the United States. It permits the use of copyrighted material without permission in certain situations, provided the use is fair and does not harm the copyright owner’s rights.[4]

Fair dealing is narrower and more rigid but is used mostly in Commonwealth countries like the UK, and Nigeria.[5] In these jurisdictions the specific purposes where copyrighted material can be used without permission will clearly be set out. Usually these include: Research or private study, criticism or review, news reporting and judicial proceedings. If the use doesn’t fall into one of these listed categories, it cannot be defended as fair dealing, no matter how “transformative” such a use may seem.

Under the U.S. Copyright Act, authors hold the exclusive right to reproduce and distribute their works, with only limited exceptions. One key and often relied-upon exception is the “fair use” doctrine, which allows certain unlicensed uses if they serve a public interest and do not unfairly compete with or replace the original market.Generally, courts consider factors such as the purpose of the use, the amount taken, and the effect on the market.[6]

The defense of fair use is the most relied upon in AI training infringement cases. One argument usually is that training does not constitute infringement because the process is transformative i.e. whether the copyrighted material is being repurposed into something meaningfully new. It is also based on the notion that AI does not store or redistribute the works in their original form, rather, it extracts patterns and relationships to generate new outputs. This has been compared to how a student reads books and later writes an essay. They do not copy the text word-for-word but they internalize ideas and then create something new. In one notable case, Suno, an AI-powered music tool acknowledged using copyrighted songs in its training process but relied on the argument that this did not amount to infringement because the AI was merely “learning.”[7]

In a way, this view is supported by the U.S. concept of fair use. Although, courts are still trying to decipher how to apply fair use when it comes to GenAI, particularly since most AI’s output are highly derivative of the copyrighted training data. Some cases are currently in contention on this subject. In the landmark Google Books case, digitizing millions of books to enable text search was considered fair use because it was transformative and did not substitute the market for the books themselves.

Perhaps the most notable case of all, Authors Guild & The New York Times v. OpenAI and Microsoft.[8]  The Authors Guild, representing well-known writers like George R. R. Martin claimed that ChatGPT was trained on copyrighted books without permission. Similarly, The New York Times accused OpenAI and Microsoft of using millions of its articles including the paywalled content to train AI models that now compete directly with journalism.

In the U.K, Getty Images also took action against Stability AI, the creator of Stable Diffusion, for scraping over 12 million copyrighted photos to build its image generator.[9] Some  AI outputs even included distorted versions of Getty’s watermarked photos which all the more proves the argument that the system’s overeliance on copyrighted material.Although the outcome of the cases remains a watch-and-wait situation as there has yet been a full decision on the merits of the cases. .

In more recent AI training disputes, in June 2025, Judge William Alsup in the U.S. District Court for the Northern District of California ruled in Chronology of Bartz v. Anthropic[10] Anthropic’s use of copyrighted books to train its Claude AI was considered  “fair use”, likened the training process to an aspiring writer reading many works to learn style and form rather than copying or substituting them. However, the court was careful to note that the ruling did not absolve Anthropic of claims related to how it acquired certain books. In particular, the judge found that Anthropic might have infringed rights by copying from pirated sources or maintaining a “central library” of pirated works that were not strictly used in the model’s final training. Thus, even though the “transformative fair use” defense was accepted with respect to the training usage, the court preserved a pathway for infringement claims tied to unlawful acquisition of those training materials.

The Possibility of Fair Dealing as a Defence for AI training under the Nigerian Copyright Law

The Nigerian Copyright Act 2022 (the “Act”) expressly prohibits the unauthorized

reproduction of copyrighted material. The Act defines reproduction as the making of one or more copies of a literary, musical, or artistic work, audiovisual work, or sound recording and further reiterates that “copy” includes digital copies.

In a jurisdiction like Nigeria which follows the fair dealing system, AI training may therefore be harder to justify because the exceptions are limited to very specific purposes. Fair dealing under Nigerian copyright Act 2022, explicitly allows for the following purposes[11]:

 

“▪ private use.

▪ parody, satire, pastiche, or caricature.

▪ non-commercial research and private study.

▪ criticism, review or the reporting of current events subject to the condition that, if the use is public, it shall, where practicable, be accompanied by an acknowledgment of the title of the work and its author except where the work is incidentally included in a broadcast”

The current provisions of the Act do not contemplate the use of copyright materials to train GenAI systems. AI training often involves wholesale reproduction of works and does not neatly fit into these categories of justifiable fair dealing, it may be difficult to justify AI training under Nigerian law. Although the Nigerian courts are yet to be confronted with the question of determining the adequacy of the defense of fair dealing for GenAI training as of now, there is clearly a gap that courts and regulators will eventually need to address.

Can the Defense of Fair Use Alone Suffice?

At present, there is no express provision dealing with AI training under most legal systems including the Nigerian law. This leaves AI training as a very gray and continually debated area, especially as it relates to copyright issues. And so the debate remains: Where do we draw the line between innovation and the protection of intellectual property rights? It is important to find a balance that advocates for innovation but also protects the foundation of copyright owners rights. 

Although many authors acknowledge that machine learning processes often require ingestion of the entirety of works and not in bits. They have thus cautioned that the wholesale taking of entire works would ordinarily weigh against fair use. Against this extensive background, it is clear that the fair use/doctrine alone cannot fully address the tensions between GenAI and copyright holders. There is therefore a need to implement measures to complement existing  legal frameworks. These include enhancing transparency and implementing robust data provenance systems, which can help build trust between copyright holders and AI developers by clearly showing where training data comes from and how it is used.

 

 

 

 


[1] Adam Zewe, ‘Explained: GenAI’ (MIT Schwarzman College of Computing, 9 November 2023). Available at https://computing.mit.edu/news/explained-generative-ai/

 

[2] Financial Times, “Who owns the copyright for AI work?” https://www.ft.com/content/74b1841f-bf57-4934-a06a-3611d61e4319?utm_source

 

[3] Dr Stanley Lai, SC, David Lim, Linda Shi and Justin Tay (Allen & Gledhill LLP), “Legal implications – Beijing Internet Court grants copyright protection to AI-generated artwork” https://law.nus.edu.sg/trail/legal-implications-beijing-internetcourt-copyright/

 

[4] Achyut Kulkarni, ‘Comparison Between Fair Use and Fair Dealing’  https://www.theipmatters.com/post/comparison-between-fair-use-and-fair-dealing

[5] Johnson Bryant  “The Defence Of Fair Dealing In Nigerian Copyright Law: Tradeoffs Between Owner And User”  https://www.mondaq.com/nigeria/copyright/754060/the-defence-of-fair-dealing-in-nigerian-copyright-law-tradeoffs-between-owner-and-user.

[6] Stanford Libraries, “Measuring Fair Use: The Four Factors” https://fairuse.stanford.edu/overview/fair-use/four-factors/

[8] The Author’s Guild “The New York Times Sues OpenAI and Microsoft for Copyright Infringement” https://authorsguild.org/news/new-york-times-sues-openai-and-microsoft-for-copyright-infringement/

 

[9] Brodies LLP, “Getty Images v Stability AI – permission for a representative claim refused”  https://www.lexology.com/library/detail.aspx?g=3594c721-eedb-465c-9ddc-5d4612a486ce

 

[10] The Author’s Guild, “Bartz v. Anthropic Settlement: What Authors Need to Know” https://authorsguild.org/advocacy/artificial-intelligence/what-authors-need-to-know-about-the-anthropic-settlement/

 

[11] Section 20(1)(a) of the Copyright Act 2022