Microsoft has released a new generative model, dubbed Magma, that can autonomously control an entire robot while processing information from its sensors — a fascinating step toward a world in which AI like ChatGPT could interact with the physical world using a robotic arm, a humanoid android, or something else entirely.

In its announcement, the tech giant claims its latest AI can process multimodal data, including text, images, and video, while also being able to "plan and act in the visual-spatial world." That means it could be used to "complete agentic tasks ranging from UI navigation to robot manipulation."

"Magma is able to formulate plans and execute actions to achieve it," Microsoft wrote in its research paper documenting the new tool. "By effectively transferring knowledge from freely available visual and language data, Magma bridges verbal and spatial intelligence to navigate complex tasks."

Magma is part of a much broader transition from simple large language models and chatbots to "AI agents," which can carry out tasks on behalf of their human overlords. But the tech still has nagging technical limitations; case in point, OpenAI's recently-released AI agent, dubbed Operator, which was designed to navigate the internet to "perform tasks for you," still requires plenty of adult supervision to get anything done.

And navigating the physical world, let alone manipulating objects, will likely be no easy task either.

Nonetheless, according to Microsoft's tests, its Magma AI "creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are tailored specifically to these tasks."

Video samples released by the company, which you can see here, show the AI placing a plastic mushroom in a metal bowl and pushing a dishcloth across a countertop.

Apart from manipulating a robotic arm, Microsoft also demonstrates how Magma could be used to assist a human agent through a live video feed, from helping out during a real-world game of chess to suggesting what to do to "relax for a few hours" in a living room.

But the AI isn't quite perfect, as Microsoft's researchers admit in their research paper. For one, the tests they came up with were highly specific.

"We note that the distribution of identities and activities in the instructional videos are not representative of the global human population and the diversity in society," the paper reads.

The move toward agentic AI could also have plenty of unintended consequences, such as introducing cybersecurity vulnerabilities through bad actors exploiting jailbreaks or injecting malicious code.

How such a scenario would play out with an AI that's controlling a robot in the physical world remains to be seen — but we might prefer not to find out.

More on agentic AI: Startup Adds Job Listing Specifically for AI Agents, With Horrible Salary


Share This Article