Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

Ah yes. And yet...

Not So Smart

In recent years, computer programmers have flocked to chatbots like OpenAI's ChatGPT to help them code, dealing a blow to places like Stack Overflow, which had to lay off nearly 30 percent of its staff last year.

The only problem? A team of researchers from Purdue University presented research this month at the Computer-Human Interaction conference that shows that 52 percent of programming answers generated by ChatGPT are incorrect.

That's a staggeringly large proportion for a program that people are relying on to be accurate and precise, underlining what other end users like writers and teachers are experiencing: AI platforms like ChatGPT often hallucinate totally incorrectly answers out of thin air.

For the study, the researchers looked over 517 questions in Stack Overflow and analyzed ChatGPT's attempt to answer them.

"We found that 52 percent of ChatGPT answers contain misinformation, 77 percent of the answers are more verbose than human answers, and 78 percent of the answers suffer from different degrees of inconsistency to human answers," they wrote.

Robot vs Human

The team also performed a linguistic analysis of 2,000 randomly selected ChatGPT answers and found they were "more formal and analytical" while portraying "less negative sentiment" — the sort of bland and cheery tone AI tends to produce.

What's especially troubling is that many human programmers seem to prefer the ChatGPT answers. The Purdue researchers polled 12 programmers — admittedly a small sample size — and found they preferred ChatGPT at a rate of 35 percent and didn't catch AI-generated mistakes at 39 percent.

Why is this happening? It might just be that ChatGPT is more polite than people online.

"The follow-up semi-structured interviews revealed that the polite language, articulated and text-book style answers, and comprehensiveness are some of the main reasons that made ChatGPT answers look more convincing, so the participants lowered their guard and overlooked some misinformation in ChatGPT answers," the researchers wrote.

The study demonstrates that ChatGPT still has major flaws — but that's cold comfort to people laid off from Stack Overflow or programmers who have to fix AI-generated mistakes in code.

More on OpenAI: Machine Learning Researcher Links OpenAI to Drug-Fueled Sex Parties

Share This Article