AI Initiative Speaker Series: Generative AI and Copyright Law
Why It Matters
Understanding how copyright law applies to AI training and output is crucial for companies to mitigate litigation risk and design compliant generative‑AI products.
Key Takeaways
- •Training AI on copyrighted data is generally deemed fair use.
- •Courts view AI training as transformative, not direct copying.
- •Model size influences memorization; larger models may reproduce copyrighted text.
- •Licensing markets for training data are emerging, challenging fair‑use assumptions.
- •Extracting verbatim excerpts from models can trigger copyright infringement claims.
Summary
Mark Lemley, a leading law professor, opened the AI Initiative’s lunch workshop by dissecting the intersection of generative AI and copyright law. He outlined three core legal questions: whether training AI on existing works infringes copyright, whether AI‑generated outputs can infringe, and who owns AI‑created content.
Lemley highlighted two Northern District of California rulings that treated AI training as fair use, emphasizing the transformative nature of creating a new model, the temporary and non‑public nature of copied data, and the lack of demonstrable market harm. He warned, however, that a nascent licensing market for training data could erode these defenses, especially as companies begin to negotiate bulk licenses.
Empirical research presented by Lemley and co‑author Cooper showed that model size matters: Llama 3.1 memorizes and can reproduce large passages of Harry Potter, while smaller models like Pythia do not. Techniques such as the “poem‑poem‑poem” attack can coax models into spitting out copyrighted text or personal information, underscoring the variability of memorization across models and datasets.
The discussion signals that firms deploying generative AI must evaluate data‑licensing strategies, implement robust output‑filtering safeguards, and monitor evolving case law. As courts refine fair‑use doctrine and licensing ecosystems mature, the legal risk profile for AI products will shift dramatically.
Comments
Want to join the conversation?
Loading comments...