The tremendous hype surrounding AI coding shows no signs of dying down. Last month, Anthropic released a suite of industry-specific plug-ins for its Claude Cowork AI agent, panicking investors over fears that traditional enterprise software-as-a-service companies could soon be made obsolete. The announcement triggered a trillion-dollar sell-off, with many tech companies seeing sharp declines in their share prices.
It even seemed to jolt Sam Altman’s OpenAI, which moved to drop many of its distracting “side quests” in a concerted effort to double down on coding and enterprise-specific AI tools.
Yet plenty of glaring questions about the long-term viability of AI programming prevail, with some warning that questionable and unverified code could come to spell disaster for corporations that eagerly embrace it.
Indeed, contrary to the hype, researchers have consistently found that AI-generated code is a bug-filled mess, forcing some programmers to pick up the pieces.
“No one knows right now what the right reference architectures or use cases are for their institution,” Dorian Smiley, CTO and founder of AI software engineering company Codestrap, told The Register.
“From the large language model perspective, people aren’t really addressing the fallibility of the underlying text,” CEO Connor Deeks added.
As software engineers continue to be put under pressure to use AI for their work — or else land on the chopping block — many errors could fall through the cracks.
“Even within the coding, it’s not working well,” Smiley told The Register. “Code can look right and pass the unit tests and still be wrong.”
The executive explained that the benchmarks required to verify code simply haven’t caught up yet, which means companies leveraging AI may be flying by the seat of their pants by using AI to verify AI code, a potentially dangerous feedback loop.
Instead, Smiley argued that we should come up with a set of new metrics to properly gauge how AI code is affecting an organization’s software and performance.
He also said that many attempts to shoehorn AI into software development are resulting in major bloat and inefficient code.
“Coding works if you measure lines of code and pull requests,” he told The Register, referring to formally accepted changes to a project. “Coding does not work if you measure quality and team performance. There’s no evidence to suggest that that’s moving in a positive direction.”
Smiley pointed out that AI doesn’t have “inductive reasoning capabilities,” ways to “reliably retrieve facts,” and “engage an internal monologue,” resulting in getting different answers to the same prompt.
“It doesn’t know if the answer it gave you is right,” he told the publication. “Those are foundational problems no one has solved in LLM technology. And you want to tell me that’s not going to manifest in code quality problems? Of course it’s going to manifest.”
The cracks are starting to show. Earlier this month, Amazon leaders summoned a large group of engineers following major outages at its online retail business, noting that “gen-AI assisted changes” may have been a “contributing factor” to the outages, as the Financial Times reported.
“Folks, as you likely know, the availability of the site and related infrastructure has not been good recently,” Amazon’s eCommerce Services senior VP Dave Treadwell told the assembled crowd.
In response, junior and mid-level engineers are now having to report any AI-assisted changes to code and have them signed off by senior engineers, seemingly undercutting the premise of AI simplifying workflows and cutting costs.
Major problems arising from hallucinating AI coding software could snowball into catastrophe at many other firms as well.
It’s a ticking time bomb even insurers aren’t willing to touch anymore, as Deeks noted.
“People are going to continue to start to feel the pressure of ‘I have to adopt this stuff, I have to make AI decisions,'” he told The Register. “They’re going to put this stuff into production, whether it’s in a business workflow or in an engineering group. And that accelerated collapse is then going to cost a lot of people their jobs.”
More on AI coding: What Actually Happens When Programmers Use AI Is Hilarious, According to a New Study