AI can learn a lot from its biological predecessors

Like the mythical ancient serpent Ouroboros eating its own tail, AI models will struggle to improve if they are forced to learn from their own flawed data

First, we learn that generative AI models can “hallucinate”, an elegant way of saying that large language models make stuff up. As ChatGPT itself informed me (in this case reliably), LLMs can generate fake historical events, non-existent people, false scientific theories and imaginary books and articles. Now, researchers tell us that some LLMs might collapse under the weight of their own imperfections. Is this really the wonder technology of our age on which hundreds of billions of dollars have been spent?

In a paper published in Nature last week, a team of researchers explored the dangers of “data pollution” in training AI systems and the risks of model collapse. Having already ingested most of the trillions of human-generated words on the internet, the latest generative AI models are now increasingly reliant on synthetic data created by AI models themselves. However, this bot-generated data can compromise the integrity of the training sets because of the loss of variance and the replication of errors. “We find that indiscriminate use of model-generated content in training causes irreversible defects in the resulting models,” the authors concluded.

您已阅读28%（1321字），剩余72%（3385字）包含更多重要信息，订阅以继续探索完整内容，并享受更多专属服务。

AI can learn a lot from its biological predecessors

人工智能

相关话题

AI can learn a lot from its biological predecessors

人工智能

相关话题

推荐阅读