The writer is chair of SpaceTime Labs
Human intellect rests on three pillars: seeing (observing the world), doing (intervening in it) and imagining (simulating what might happen under different choices). Right now, artificial intelligence inhabits only one of these pillars.
Expanding existing frontier AI models will not address this problem. The breakthrough that set off today’s frenzy was the transformer architecture, developed at Google and scaled up into large language models trained on much of the public internet and used to write text and code. Then came agents that stitch these models together into automated workflows. Now the focus is on “world models”, which try to capture the physical environment from vast streams of video and other inputs.