Learning Like Humans: The Industrial Potential of World Models

World models promise to overcome the limitations of LLMs. How? And what are the implications for industry and businesses? We take stock with two AI experts: Gilles Babinet and Laurence Devillers. (iStock)

by Camille Rustici

March 25, 2026Reading time: 27 mins

World models promise to overcome the limitations of LLMs. How? And what are the implications for industry and businesses? We take stock with two AI experts: Gilles Babinet and Laurence Devillers.

Since 2024, artificial intelligence has undergone a quieter revolution than the arrival of ChatGPT in 2022, but one that may be even more significant than generative AI itself: the emergence of world models.

This radically different approach from LLMs (Large Language Models), which currently dominate the AI landscape, is championed by figures such as Yann Le Cun in France and Fei-Fei Li in the United States. It aims to simulate and understand the physical world, opening the door to unprecedented industrial applications, from robotics to autonomous vehicles to healthcare and to video analytics.

But what are the real stakes behind this technology? What impact can companies expect? What challenges remain? We spoke with Gilles Babinet, French entrepreneur, in charge of the Cafe IA mission, and Laurence Devillers, teacher–researcher in computer science applied to the social sciences at Paris-Sorbonne University and author of “Savoir-vivre avec l’IA” (Knowing how to live with AI).

Understanding the World

To understand world models, we must first understand the limits of LLMs.

LLMs excel at predicting sequences of text, essentially predicting the next word based on probabilities. They are extremely useful for writing assistance, summarization, and translation, tools we all use daily. However, they lack a true understanding of context, space, and time.

World models, by contrast, aim to model the physical and causal dynamics of the real world in 3D. They could simulate, plan, and control complex environments because they are designed to understand the world.

As Laurence Devillers explains:

“World models integrate multimodal, perceptual, real-world data. This is completely different from what LLMs do. It’s another way of building systems capable of learning and reasoning—not artificial consciousness, but a form of metacognition based on what the system perceives and knows about the world.”

French Yann Le Cun believes this approach is necessary to overcome LLM limitations, particularly their inability to achieve artificial general intelligence (AGI), due to their lack of human-like reasoning.

This is why this major figure in AI (2018 Turing Award) and former META C-level (which he left at the end of 2025) came back to France at the end of last year to launch AMI Labs (Advanced Machine Intelligence), his Paris-based startup dedicated to world models. AMI Labs just raised $1.03 billion this month, with a valuation of $3.5 billion.

The Competitive Landscape

AMI Labs is not alone in the world models space. Several major players are positioning themselves on this frontier, with comparable approaches and comparable war chests.

Known for its video generation tools, Runway is heavily investing in “General World Models” (GWM), which can simulate reality in real time. Their goal is to to create interactive and predictive environments, with applications in filmmaking, robotics, and industrial design.

Covariant is also using world models to enable machines to adaptively learn and perform complex manipulation tasks in real-world industrial environments like warehouses and factories.

NVIDIA is leveraging its high-performance GPU infrastructure to develop world models that simulate and optimize autonomous systems, from self-driving cars to advanced robotics, bridging virtual training with real-world deployment.

But AMI Labs’s most direct competitor is World Labs, an American startup founded by AI pioneer Fei-Fei Li. The company also raised $1 billion in 2026 and is valued at $5 billion. World Labs develops models capable of generating physically coherent 3D worlds. Its Marble project is their a notable example; And they are targeting robotics, industrial simulation, and virtual environments. While the ambition is similar to AMI Labs’, the approach differs: World Labs focuses on generating simulated worlds rather than on JEPA-style architecture.

For Babinet, Fei-Fei Li’s move into world models was a decisive signal:

“What really struck me was seeing Fei-Fei Li commit to world models. Because she has been present at every major revolution in AI. Back in the early 2000s she created ImageNet. She watched every wave of deep learning unfold from the inside. When someone like her bets on world models, that’s a very significant signal.”

Devillers agrees that the direction is now clear to anyone paying attention:

“Everyone who is seriously engaged in this field knows this is where we need to go. There is a growing consensus that grounding AI in perception, action and time improves robustness. Systems that can simulate outcomes (a core idea in world models) tend to reason better in dynamic environments (robotics, games, planning).”

AMI Labs: Learning Like a Child

So, what does Yann Le Cun’s model actually propose? And why does it matter? AMI Labs is built on JEPA (Joint Embedding Predictive Architecture), an architecture Le Cun developed during his years at Meta. The core bet is simple but radical: models shouldn’t learn about the world through text. They should learn the way children do, through direct observation of reality, from raw sensory data: images, video and sound.

As digital entrepreneur Gilles Babinet puts it:

“Text is a rather flat representation of the world.”

Indeed, text can tell a model that “fire is hot,” but it gives it no physical intuition of what heat is, how it spreads, or how to avoid it. Same with a ball than falls.

He adds:

“World models, on the other hand, are built from stereoscopic vision. Let’s take an example. The phone rings. In an autoregressive model [the kind that powers today’s LLMs] you reconstruct the entire decision process statistically, token by token, and you might end up somewhere between ‘pick up’ and ‘don’t pick up,’ without certainty. In a world model, you go directly to a pre-built action loop: pick up, or don’t. Because the model has learned to identify that pattern as a pattern, and to trigger it directly. That’s something an LLM fundamentally cannot do.”

And this is more efficient and deterministic. These pre-built action loops are closer to muscle memory than to reasoning. They bypass the need to recompute everything from scratch, the way a driver doesn’t consciously rethink how to brake at a red light. World models aim to replicate exactly this kind of embedded, efficient intelligence.

Stereoscopic vision is central to making it work. Two-eyed depth perception is what gives biological learners a 3D model of the world from 2D inputs — the foundation for understanding that objects persist when hidden, that causes precede effects, that space is continuous and navigable.

This is a point Yann Le Cun made, Gilles Babinet goes on:

“You need a representation of the world that is built from what we’ve learned. He says that every day, we absorb a petabyte of data, and we never stop learning because we construct our reality from it. This is why to achieve this, we need to use stereoscopic vision.”

A child who has never read a single word already understands gravity, facial expressions, and cause-and-effect. That is the benchmark world models are chasing.

Understanding Humans

The goal of world models isn’t just to understand the physical world. It is to understand us. Because the ultimate objective is to interact with humans in a genuinely human way. Ad you cannot interact well with something you don’t deeply understand. Here again, LLMs fall short.

Laurence Devillers frames the stakes clearly:

“A new market is emerging: autonomous vehicles, humanoid robots moving through our environments, picking up objects, executing tasks. For a machine to plan actions, navigate physical space, and respond meaningfully to human needs, that requires new kinds of world modelling and understanding. Language models, on their own, are not sufficient for this: if integrated as-is into a robot today, their usefulness would remain very limited.”

Gilles Babinet reinforces the point:

“When you see a humanoid robot playing tennis or dancing, what you’re really watching is essentially an automaton. Its actual autonomy is very limited. You can’t run an LLM inside it. Maybe a VLM at a stretch. But getting a system to genuinely grasp the complexity of the real world with an LLM just doesn’t work.”

To align with human behavior and understand our world, machines will need to model human emotions in some form. But Laurence Devillers is careful to draw a sharp line:

“That doesn’t mean the machine will actually feel emotions. It will never be conscious, never experience feelings. But it will be able to make decisions based on how humans behave and what they’re going through.”

For a robot to navigate situations involving pain, distress, or vulnerability, and respond appropriately, it needs a working model of what it means to be human. As Devillers puts it succinctly:

“A true world model is not just a simulation of the physical world; it is also a model of humans within that world, their intentions, expectations, and interactions.”

Humanoid Robots at Logimat 2026 (C. Rustici)

Concrete Applications

World models are targeting sectors where understanding the physical world and human behavior is non-negotiable. For now these remain largely industrial niches, but as Devillers suggests, robotics and autonomous vehicles are the natural proving grounds.

Robotics is perhaps the most immediate frontier. World models could enable humanoid robots to operate in unstructured environments, handling objects they’ve never encountered before, adapting on the fly. Think of 1X Technologies’ Neo robot as an early example.

The implications for logistics are significant: dynamic warehouse management, flexible production lines, small-batch manufacturing. Nvidia has been vocal about exactly this use case during a conference we attended last year. They are using simulated environments to train robots to handle sudden, unpredictable conditions before deploying them in the real world. Physical AI, as they call it.

Autonomous vehicles would also benefit from the same leap. Rather than following pre-programmed rules, a world-model-powered car could genuinely adapt to unforeseen situations, plan decisions in real time, and reason about the road the way a human driver would.

Healthcare opens another dimension entirely, with robots that accompany patients and elderly people, capable of understanding their needs and responding to them with appropriate judgment. As Devillers describes it:

“You could have robots that are genuinely useful in hospitals, handling repetitive tasks that still require a degree of perceptual awareness and contextual sensitivity. Think of them as our AI agents, but embodied.”

But she’s careful to frame the ambition realistically:

“We’ll go further once we master the multimodal and spatial dimensions — building action plans that are genuinely useful in industry. But you have to think vertically: specialized robots that excel at specific tasks, specific action planning. We are not building something that is nearly equivalent to human intelligence.”

The broader industrial transformation could be sweeping, argues Babinet:

“If I were running a company where productivity gains are tightly linked to robotics, I’d be paying very close attention right now. Especially in small-batch, high-value manufacturing, that’s where you’ll see strong robot autonomy emerge fast. They won’t be faster than humans, but you’ll be able to tell a robot: here’s an order from a new country, here’s the applicable standard, configure yourself to meet it, and add these specific features on top. That’s genuinely powerful.”

And on the investment case, Babinet is direct:

“World models are very bankable, as far as I’m concerned. Everything industrial, everything humanoid, everything vision-based, everything autonomous, it all points in the same direction. It’s a solid bet.”

Have World Models Fewer Hallucinations?

Beyond their ambition to surpass LLMs in understanding the world, world models offer a technically compelling advantage. At least in theory. Both experts agree that models grounded in direct observation of reality should hallucinate significantly less. Babinet explains why:

“It comes down to those conceptual action-loops. The phone rings, there are only so many possible responses. I pick up, I don’t pick up, I check the number, I send a text back. What you don’t do is start speaking into the phone before you’ve answered it. But in an LLM, that kind of nonsensical output is entirely possible. In a world model, it isn’t. The space of possible outcomes is constrained by reality, which drastically reduces uncertainty.”

Babinet is candid that hallucinations aren’t just an inconvenience, they represent a structural ceiling for LLMs. There’s a phenomenon known as the “bitter lesson”: beyond a certain level of abstraction, hallucinations don’t diminish, they compound. You can’t engineer your way out of it.

“For me, this problem is too fundamental to keep building on LLMs alone. We haven’t managed to bring hallucination rates down, and if 15% is the floor, that’s nowhere near good enough for unsupervised systems. On top of that, we have a compute cost problem that is simply prohibitive. Scaling laws mean that a 10% performance gain requires a tenfold increase in compute. And we have a multi-agent coordination problem that remains largely unsolved: we don’t yet know whether LLMs can reliably supervise one another.”

Technological Challenges

A new path is clearly needed. But the road won’t be straight, and the bet isn’t won yet. Both experts are clear-eyed about the obstacles ahead.

The first challenge is training data. As Babinet points out:

“With LLMs, you have training data, vast amounts of it, readily available in text form. With world models, you don’t. Accumulating petabytes or even hettabytes of stereoscopic vision data is an enormous undertaking. It won’t happen overnight.”

Laurence Devillers raises the thorny question of how that data gets collected in the first place:

“The problem is regulation. In Europe, GDPR in particular prevents us from collecting data the way Le Cun originally envisioned, which was through Meta smart glasses. Think about it: what’s the ideal device for continuously capturing spatial, temporal, and audio data of our interactions with the world and with other people? A pair of glasses sitting on your nose. But you can’t record that way for privacy reasons. So we need to figure out an alternative before we can build truly capable systems.”

Babinet adds another fundamental limitation: a significant portion of reality simply isn’t observable. Everything conceptual, everything internal, falls outside the reach of sensory data.

“When I say I don’t feel well, that’s not something you can see.”

There’s also the question of cost. Training a world model is no cheaper than training an LLM. It demands enormous compute resources. In fact, there’s a reasonable chance that a large part of AMI Labs’ $1 billion raise will go straight into compute, Babinet believes:

“There are plenty of technical problems still to solve along the way. World models are very strong on everything visual, but there’s real work to do on the rest, chain-of-thought reasoning in particular. That needs to be cracked.”

And as with LLMs, Babinet doesn’t rule out hitting scaling law walls of their own.

Devillers goes further, pointing to the sheer complexity of replicating human intelligence. In her view, AGI, a general representation of intelligence, remains a utopia, and world models alone won’t get us there:

“It will require enormous amounts of data and energy. We’re not there yet. We’ll need additional models on top, humans are extraordinarily complex. World models are not sufficient on their own to approximate how human cognition works. Other contributions will be needed.”

And we should not forget the ethical dimension. The closer we get to machines that genuinely understand and model human behavior, the more capable, and potentially dangerous, the physical objects running those models become.

As Laurence Devillers puts it starkly:

“If we’re building systems that get closer and closer to human-level understanding, we’re also giving physical objects increasingly powerful capabilities, and those objects are far more dangerous if you let them loose in our shared spaces. An LLM stays inside a computer. A robot with intelligence and physical strength is a different kind of risk. It relocates the danger.”

The question isn’t only technical readiness, but governance: who decides what these systems are allowed to do, in whose spaces, under what oversight? Those are the questions the field will have to answer. But Devillers sees the safety research community already orienting itself toward this challenge:

“A lot of us are already working on better ways to control and align LLMs. The day that work extends to vision-language models and JEPA-based architectures, we’ll develop the same kind of frameworks for those too.”

Is It the End of LLMs?

Does this spell the end of large language models? Are LLMs, which currently dominate the market, destined for obsolescence?

While it’s true that a growing number of players are moving toward world models, the leading LLM labs, OpenAI and Anthropic chief among them, remain firmly committed to the language model path, convinced it can still lead to superintelligence.

Laurence Devillers doesn’t see world models making LLMs obsolete but she does see them puncturing a particular illusion:

“LLMs are useful for translating, summarizing, and explaining, but not for understanding the physical world around us. What they make obsolete is the idea that LLMs can achieve superintelligence. The discourse coming out of Silicon Valley right now is completely unhinged when it comes to the supposed power of these language models to produce emergent intelligence, or even consciousness. It goes way too far. We are going to build increasingly complex AI systems; perhaps one day a form of AGI will be possible, but it will not be human.”

It’s precisely because Silicon Valley’s leading labs are so deeply entrenched in the language model paradigm that Le Cun chose to return to France where foundational science, he believes, is more naturally mobilized in service of this kind of long-horizon research.

Rather than a winner-takes-all outcome, Gilles Babinet envisions a hybrid architecture where both approaches coexist, each handling what it does best:

“For me, the winner will be whoever builds the best orchestrator, an agentic system that knows which capability to reach for depending on the task at hand. Le Cun’s vision, as I understand it, and we’ll see, because there are discrete complexity challenges that world models handle poorly, is to build that orchestrator around JEPA. Personally, I wouldn’t rule out a medium-term hybrid between vision-language models and world models.”

In some ways, that hybridization is already underway. Babinet acknowledges that while world models are primarily suited to visual processing, existing transformer-based systems are already being applied to images with real results:

“We’re building VLMs that work well for simpler cases. We take transformer architectures and apply them to images and some of it already works.”

And today’s flagship LLM products have already moved well beyond pure language:

“ChatGPT and Sora are already examples of systems that go beyond text. ChatGPT is a multimodal model running inside an agentic framework with orchestration layers. You can’t really call it a plain transformer model anymore.”

The hybrid path is also being pursued at the research frontier. Google DeepMind is already working on models that combine LLMs and world models for robotics applications. AlphaFold 3 and RoboCat are notable examples. Their approach aims to integrate physical understanding with abstract reasoning, though it remains less specialized than what AMI Labs is building.

France on the AI Map

Could world models become a lever for European technological sovereignty, countering American and Chinese dominance in LLMs? Both experts are cautious about overreaching conclusions but they agree: it’s an encouraging and energizing moment.

At a moment when Europe is routinely described as having missed the technological wave, France, through Le Cun, his team, and AMI Labs has a real card to play.

Laurence Devillers doesn’t hide her enthusiasm at seeing a French researcher, a French team, and a French laboratory planting a flag on the global AI map, alongside players like Mistral AI, and demonstrating to the world that genuine innovation is possible in Europe.