The new generation of artificial intelligence systems, going beyond chatbots, can now understand what they see and predict the next steps. “World models” are opening the door to one of the greatest leaps in the history of technology.
Until recently, artificial intelligence has been associated with a familiar scene: ChatGPT- style chatbots, systems producing text in seconds, or algorithms recreating images. Now, the scene is shifting. Cutting-edge research is taking AI beyond being simply a “text producer,” turning it into systems capable of perceiving the physical world and functioning within it. This new paradigm is called world models.
Models such as Meta’s V-JEPA or the video-based structures developed at Google DeepMind not only identify what is in front of them, but also address the question, “What happens next?” This demonstrates AI’s ability to anticipate the future and take action. For instance, when analyzing a video feed, the model doesn’t stop at “this is a truck,” but goes further to predict, “in the next moment, this truck will start turning.”
Contact with the physical world has always created major turning points in the history of technology. When computers moved from the office to our pockets and then into our homes, life changed radically. A similar leap is now happening in artificial intelligence. World models are transforming robots from “pre-programmed machines” into “beings that learn by seeing.” It won’t be long before we see “learning machines” on factory lines, in warehouses, and even in daily life.
Of course, this transformation is not only full of opportunities. The fact that AI systems make decisions in the real world also raises many new questions, from security to ethical responsibility. The consequences of a machine’s “wrong decision” are no longer just an email error; they can have direct effects in the physical world.
Today, world models may still be tested in laboratories. But as we have seen in the history of technology, when such breakthroughs reach the field, a whole new era begins. Artificial intelligence no longer just speaks; it sees, thinks and acts.