AI has taken the programming world by storm, with a flurry of speculation about the tech replacing human coders, and Google's CEO recently claiming that 25 percent of the company's code is now AI-generated.
But it's possible that in practice, AI is actually hindering efficient software development. As flagged by Ars Technica, a new study from the nonprofit Model Evaluation and Threat Research (METR) found that in practice, programmers are actually slower when using AI assistance tools than making do without them.
In the study, 16 programmers were given roughly 250 coding tasks and asked to either use no AI assistance, or employ what METR characterized as "early-2025 AI tools" like Anthropic's Claude and Cursor Pro. The results were surprising, and perhaps profound: the programmers actually spent 19 percent more time when using AI than when forgoing it.
When measuring the programmers' screen time, the METR team found that when using AI tools, their subjects did indeed spend less time actively coding, debugging, researching, or testing — but that was because they instead spent their time "reviewing AI outputs, prompting AI systems, and waiting for AI generations."
Ultimately, the AI-assisted cohort accepted less than 44 percent of the tips provided by the tools without any modification, and nine percent of the total time they spent on tasks was eaten up by fixing the AI's outputs. (That's not entirely surprising; companies that laid off people to replace them with AI are now having to hire new contractors to fix the technology's mistakes.)
Despite the results, however, the programmers in the study believed initially that AI would reduce nearly a quarter of the time spent on tasks — and afterward, they still thought those tools sped them up by 20 percent.
Perhaps contributing to the disconnect between expectation and reality with AI coding are all those benchmarks claiming that these tools, and others like OpenAI's o3 reasoning model and Google's Gemini, are spitting out immaculate code at record speeds. As Ars notes, however, those benchmarks rely on "synthetic, algorithmically scorable tasks created specifically" for such tests, which may be a poor reflection of the messy world of actual coding.
This isn't the first time that the narrative of AI's dominance in coding has been rattled by research findings. Earlier this year, for instance, OpenAI researchers released a paper declaring, based on benchmarking tests from real-world coding tasks, that even the most advanced large language models "are still unable to solve the majority" of problems.
AI is also giving rise to other unintended consequences in the world of software development. Untrained programmers who engage in so-called "vibe coding," or writing and fixing code by describing what they want to an AI, are not only screwing up their work itself, but also self-sabotaging by introducing severe cybersecurity risks to the finished product as well.
With so many tech workers being laid off in favor of automation, it stands to reason that code generated after such firings is less accurate and secure than it was when humans were writing it — but thus far, that hasn't seemed to matter much to the people doing the job cuts.
More on AI coding: Researchers Trained an AI on Flawed Code and It Became a Psychopath
Share This Article