Google Deepmind has revealed the Genie 3, its latest Foundation World Model, which can be used to train general-purpose AI agents, a capacity that AI Lab says “Artificial General Intelligence,” or makes a significant step for a human information like human information.
“Jinn 3 is the first real-time interactive general-obvious world model,” Shloomi Fruitter, a research director at Deepmind, said during a press briefing, “Shloomi Frutter, a research director at Deepmind, said during a press briefing. “It is beyond the model of the narrow world that was previously present. It is not specific to a particular environment. It can produce the photo-ethics and fictional world, and everything in the middle.”
Still research is not available in preview and publicly, Jinn 3 constructs on both its predecessor genie (which can generate new environment for agents) and the latest video generation model VO3 (called for a deeper understanding of physics).

With a simple text prompt, the genie can generate several minutes of the interactive 3D atmosphere at 720p resolution per second – a significant leap from 10 to 20 seconds can produce a significant jinn 2 from 10 to 20 seconds. The model also has the ability to use a signal to “accelerated world events,” or change the generated world.
Perhaps most importantly, the simulation of Jinn 3 remains physically consistent over time, as the model may remember what it was earlier – a capacity that Deepmind says that its researchers say that its researchers did not show clearly in the model.
Fruitter stated that while Jinn 3 has implications for educational experiences, gaming or prototype creative concepts, its actual unlock will appear in training agents for general-purpose functions, which he said is necessary to reach AGI.
“We feel that world models are important on the path of AGI, especially for agents embodied, where it is especially challenging to follow the real-world landscapes,” said during the briefing by a research scientist Jack Parker-Holder, a research scientist at the open-endedness team of Deepmind.
Techcrunch event
San francisco
,
27-29 October, 2025

Jinn 3 is designed to solve that bottleneck. Like the VO, it does not rely on a hard-coded physics engine; Instead, Deepmind says, the model teaches itself how the world works – how the objects run, fall, and interact – remembering what it has arisen and arguing on the horizon for a long time.
“The model is auto-rigging, which means that it produces a frame at a time,” the fructator told Techcrunch in an interview. “It will have to see what was going to be done next, what was generated before deciding. It is an important part of architecture.”
This memory, the company says, lends for stability in the fake world of genie 3, which in turn allows it to develop an understanding of physics, similarly man understands how a person understands that a glass of tattering on the edge of a table is about to fall, or they must duck to avoid the falling object.
In particular, Deepmind says that the model also has the ability to push AI agents to its limits – forcing them to learn from their own experience, similarly how humans learn in the real world.
As an example, Deepmind shared his test of Jinn 3 with a recent version of its generalist Scalable Instructional Multivorld Agent (Simma)Instruct it to pursue a set of goals. In a warehouse setting, he asked the agent to do tasks such as “vision of bright green waste compactors” or “walking for packed red forklift”.
“In all three cases, Sima agent is capable of achieving the target,” said the Parker-Holder. “It just receives action from the agent. So the agent takes the target, sees fake at the world, and then takes action in the world. Jini 3 follows ahead, and the fact that it is capable of achieving that it is consistent.”

He said, Jinn 3 has its limits. For example, while researchers claim that it may understand physics, showing a skier barreling under a mountain does not show how snow moves in relation to the skier.
Additionally, an agent that can take action is limited. For example, quick-efficient world events allow for a wide range of environmental interventions, but they are not necessarily performed by the agent itself. And it is difficult to correctly model the complex interaction between many independent agents in a shared environment.
Jinn 3 can also support only a few minutes of continuous interaction, when hours will be required for proper training.
Nevertheless, the model presents a compelling step in teaching agents so that to go beyond reacting to the input, to plan them potentially, seek uncertainty, and to improve through testing and error-the type of well-learned learning to say that many people are important to move towards general intelligence.
The Parker-Holder said, “We did not really have 37 moments for the embodied agents, where they can actually take novels in the real world,”
“But now, we can potentially enter a new era,” he said.