FT商学院

AI can learn a lot from its biological predecessors

Synthetic data risks ‘model collapse’ but may spur renewed interest in robotics
Like the mythical ancient serpent Ouroboros eating its own tail, AI models will struggle to improve if they are forced to learn from their own flawed data

First, we learn that generative AI models can “hallucinate”, an elegant way of saying that large language models make stuff up. As ChatGPT itself informed me (in this case reliably), LLMs can generate fake historical events, non-existent people, false scientific theories and imaginary books and articles. Now, researchers tell us that some LLMs might collapse under the weight of their own imperfections. Is this really the wonder technology of our age on which hundreds of billions of dollars have been spent?

In a paper published in Nature last week, a team of researchers explored the dangers of “data pollution” in training AI systems and the risks of model collapse. Having already ingested most of the trillions of human-generated words on the internet, the latest generative AI models are now increasingly reliant on synthetic data created by AI models themselves. However, this bot-generated data can compromise the integrity of the training sets because of the loss of variance and the replication of errors. “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models,” the authors concluded.

您已阅读28%(1321字),剩余72%(3385字)包含更多重要信息,订阅以继续探索完整内容,并享受更多专属服务。
版权声明:本文版权归manbetx20客户端下载 所有,未经允许任何单位或个人不得转载,复制或以任何其他方式使用本文全部或部分,侵权必究。
设置字号×
最小
较小
默认
较大
最大
分享×