AI Agents Are Mathematically Incapable of Doing Functional Work, Paper Finds

A paper claims to mathematically prove that AI agents have a hard ceiling to their capabilities that they will never surpass. — Gaston Paris / Contributor

A months-old but until now overlooked study recently featured in Wired claims to mathematically prove that large language models “are incapable of carrying out computational and agentic tasks beyond a certain complexity” — that level of complexity being, crucially, pretty low.

The paper, which has not been peer reviewed, was written by Vishal Sikka, a former CTO at the German software giant SAP, and his son Varin Sikka. Sikka senior knows a thing or two about AI: he studied under John McCarthy, the Turing Award-winning computer scientist who literally founded the entire field of artificial intelligence, and in fact helped coin the very term.

“There is no way they can be reliable,” Vishal Sikka told Wired.

When asked by the interviewer, Sikka also agreed that we should forget about AI agents running nuclear power plants and other strident promises thrown around by AI boosters.

Ignore the rhetoric that tech CEOs spew onstage and pay attention to what the researchers that work for them are finding, and you’ll find that even the AI industry agrees that the tech has some fundamental limitations baked into its architecture. In September, for example, OpenAI scientists admitted that AI hallucinations, in which LLMs confidently make up facts, were still a pervasive problem even in increasingly advanced systems, and that model accuracy would “never” reach 100 percent.

That would seemingly put a big dent in the feasibility of so-called AI agents, which are models designed to autonomously carry out tasks without human intervention, and which the industry universally decided last year would be its next big thing. Some companies that embraced AI agents to downsize their workforces quickly realized that the agents they weren’t anywhere near good enough to replace the outgoing humans, perhaps because they hallucinated so much and could barely complete any of the tasks given to them.

AI leaders insist that stronger guardrails external to the AI models can filter out the hallucinations. They may always be prone to hallucinating, but if these slip-ups are rare enough, then eventually companies will trust them to start doing tasks that they once entrusted to flesh and blood grunts. In the same paper that OpenAI researchers conceded that the models would never reach perfect accuracy, they also dismissed the idea that hallucinations are “inevitable,” because LLMs “can abstain when uncertain.” (Despite that, you’d be hard-pressed to find a single popular chatbot that actually does that, almost certainly because it would make the chatbots seem less impressive and less engaging to use.)

Even though he’s adamant LLMs have a hard ceiling, Sikka agrees with figures in the AI industry who insist that hallucinations can be reined in.

“Our paper is saying that a pure LLM has this inherent limitation — but at the same time it is true that you can build components around LLMs that overcome those limitations,” he told Wired.

More on AI: OnlyFans Rival Seemingly Succumbs to AI Psychosis, Which We Dare You to Try Explain to Your Parents