AI’s ‘memorisation’ problem: the novels it can’t forget

Research that shows LLMs memorise more training data than previously thought raises questions about copyright infringement

更新于2026年2月22日 13:29 Melissa Heikkilä

The world’s top AI models can be prompted to generate near-verbatim copies of bestselling novels, raising fresh questions about the industry’s claim that its systems do not store copyrighted works.

A series of recent studies has shown that large language models from OpenAI, Google, Meta, Anthropic and xAI memorise far more of their training data than previously thought.

AI and legal experts told the FT this “memorisation” ability could have serious ramifications on AI groups’ battle against dozens of copyright lawsuits around the world, as it undermines their core defence that LLMs “learn” from copyrighted works but do not store copies.

您已阅读12%（642字），剩余88%（4624字）包含更多重要信息，订阅以继续探索完整内容，并享受更多专属服务。

AI’s ‘memorisation’ problem: the novels it can’t forget

人工智能

相关话题

AI’s ‘memorisation’ problem: the novels it can’t forget

人工智能

相关话题

推荐阅读