Apparently not content with its grip on this world, Google is in the process of staffing up its DeepMind research lab to build generative models that are capable of simulating the physical world. The project—which will be headed up by Tim Brooks, one of the leads who helped build OpenAI’s video generator, Sora—will be a critical part of the company’s attempt to achieve artificial general intelligence, according to job listings related to the new team.
Brooks, who joined DeepMind after fleeing from OpenAI back in October, and his team have “ambitious plans to make massive generative models that simulate the world.” According to the role descriptions, the effort to build world models will “power numerous domains, such as visual reasoning and simulation, planning for embodied agents, and real-time interactive entertainment.” If you’re willing to take on one of these roles, maybe you can figure out what those vagueries mean and get back to us.
A world model, put as simply as possible, typically seeks to simulate how the world actually works. Generative models like Sora are able to replicate things that it has seen before within its training data, it doesn’t have any real understanding as to why that thing happens. So it can successfully generate a video of a person throwing a baseball, but it doesn’t have any understanding of the physics of what is happening. World models aim to arm the machine with enough information to actually parse through how an action happens and the likely outcome of it.
Meta’s chief AI scientist Yann LeCun described world models this way during a speech at Hudson Forum earlier this year: “A world model is your mental model of how the world behaves…You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of action will be on the world.”
World models are difficult to build for a number of reasons, including the massive amount of compute needed to run a model and the lack of sufficient training data to create an accurate model, resulting in most world models working only for limited and specific contexts.
DeepMind’s team seems intent on taking the world model wider. The plan is to build “real-time interactive generation” tools on top of the models and potentially look into how they could integrate their world model into Google’s large language model Gemini.
One likely area that DeepMind will try to tackle is video games. The job description for the new team notes that they will collaborate with the Veo and Genie teams at Google. Genie is Google’s Sora-like video generator and Genie is an existing world model that can simulate 3D environments in real time. The video game industry is already keen to adopt AI tools, displacing thousands of workers. A CVL Economics survey found that more than 86% of all gaming firms have already adopted generative AI tools and nearly 15% of all gaming jobs could be disrupted by 2026.
Maybe improving this world would be a better use of time than modeling it.