Highlights
- Google DeepMind’s SIMA 2 launched to learn, reason, and evolve inside 3D virtual worlds.
- It learns through trial and error, interprets visuals, simulates actions, and engages users like a gaming companion.
- SIMA 2 can now explain its intentions, plan tasks, and adapt to entirely new environments without prior exposure.

Caption – Google’s DeepMind unveils SIMA 2.
Google DeepMind has officially unveiled SIMA 2, the next-generation version of its AI agent built to operate, learn, and reason inside 3D virtual environments. It is powered by Gemini and the upgraded agent marks a major leap from its predecessor. It brings it closer to human-level task performance in complex digital worlds.
SIMA is short for Scalable Instructable Multiworld Agent and was introduced last year as a generalist AI capable of following simple instructions across a variety of virtual game environments. According to Google, SIMA represented a major advancement in teaching AI to convert natural language into meaningful in-world actions.
With SIMA 2, Google DeepMind is now aiming at a broader goal like building an interactive, self-improving AI agent that can think, communicate, and learn like a companion inside games.
The company describes SIMA 2 as a major milestone in its journey toward building generally capable and helpful AI systems. It is potentially even a step toward Artificial General Intelligence (AGI).
SIMA 2 integrates tightly with Gemini, enabling it to understand human-language instructions, plan tasks, hold conversations with users, and improve autonomously. Thanks to advanced reasoning capabilities, the agent can now analyse goals, respond intelligently, and refine its skills over time.
In tests involving games it had never encountered before such as ASKA and MineDojo. SIMA 2 reportedly completed 45–75% of tasks, a substantial improvement over SIMA 1’s 15–30% task completion rate in similar scenarios.
The agent learns entirely through trial and error without direct human training data. DeepMind explains that SIMA 2 uses Gemini models to generate new tasks, evaluate attempts, and learn from mistakes. It navigates virtual worlds by interpreting on-screen visuals, simulating keyboard and mouse actions, and engaging users like a real gaming partner.
Google also tested SIMA 2 in environments generated via its Genie 3 model, where the AI adapted successfully to completely unfamiliar worlds without prior exposure or training.
DeepMind highlights that SIMA 2’s architecture, backed by Gemini‘s strong reasoning abilities, allows it to comprehend high-level objectives and carry out complex, goal-oriented actions.
The company further detailed its hybrid training approach in an official blog post. “We trained SIMA 2 using a mixture of human demonstration videos with language labels as well as Gemini-generated labels. As a result, SIMA 2 can now describe to the user what it intends to do and detail the steps it’s taking to accomplish its goals,” the post reads.
FAQs
Q1. What is SIMA 2 and how is it different from SIMA 1?
Answer. SIMA 2 is a Gemini-powered AI agent designed to learn, reason, and evolve inside 3D virtual worlds. It significantly improves task performance, completing 45–75% of tasks in unfamiliar games, compared to SIMA 1’s 15–30%.
Q2. How does SIMA 2 learn and interact in virtual environments?
Answer. SIMA 2 learns autonomously through trial and error, interprets on-screen visuals, simulates keyboard/mouse actions, and uses Gemini to generate tasks, evaluate outcomes, and refine its skills.
Q3. Can SIMA 2 explain its actions to users?
Answer. Yes, thanks to hybrid training with human demo videos and Gemini-generated labels, SIMA 2 can describe its intentions and explain the steps it’s taking to achieve its goals.
