Software Schools Use to Detect Cheating Is Flagging Real Essays as AI Generated

A whopping 2.1 million teachers in the US are using a new AI detection tool by Turnitin, the Washington Post reports, in an attempt to catch their pupils in the act of using tools like ChatGPT to cheat in class.

But as it turns out, the tool isn't very good at what it was designed to do, which is likely resulting in students being wrongfully accused of having used AI tools for essays and assignments — a worrying side effect of putting these tools in the hands of practically anybody with an internet connection.

The reality is that while AI chatbots have exploded onto the scene and are continuing to improve, tools that are capable of distinguishing between AI-generated and human-written text are woefully behind.

In other words, it's a huge problem without a solution that is putting both teachers and students in a precarious position.

WaPo's Geoffrey Fowler gave Turnitin's AI detection tool a whirl by testing it with the help of five high school students. Out of 16 samples, which included human-written, AI-generated, and mixed-source essays, the tool was wrong more than half the time.

In short, the tool probably shouldn't be used by teachers to accuse their students of using AI, despite Turnitin claiming that its detector is 98 percent accurate.

In all fairness, the company does point out on its website that the results of its tool shouldn't be used to accuse students of cheating, but whether that's going to stop every teacher out there seems unlikely.

A quick perusal of the ChatGPT subreddit shows countless examples of students being accused of using AI on their papers.

Turnitin and other companies working on AI detectors clearly have a challenge ahead when it comes to communicating what these half-baked tools are actually capable of — and how they should be used.

"Our job is to create directionally correct information for the teacher to prompt a conversation," Turnitin chief product officer Annie Chechitelli told the WP. "I’m confident enough to put it out in the market, as long as we’re continuing to educate educators on how to use the data."

Turnitin's lackluster AI detector isn't the only tool out there that has failed to live up to the challenge. Even OpenAI's own AI detection tool fell far short of a perfect mark when we fed it a mix of AI- and human-generated text.

The tool was easily capable of identifying all ten of the samples that were written by a human, but only identified four of the ten samples that were generated by ChatGPT. One ChatGPT sample was even listed as "very unlikely" to have been AI-generated.

Unreliable AI detection tools are going to be a huge problem for students going forward.

"There is no way to prove that you didn’t cheat unless your teacher knows your writing style, or trusts you as a student," one of the students told WaPo.

Unfortunately, given recent progress with AI models, the problem is bound to get even sticker. With the likes of GPT-4 and other emerging competitors like Google Bard vying for the number one AI chatbot spot, their ability to evade detection will more than likely improve over time.

That means we'll have to come up with an entirely different solution for teachers to regain the trust of their students as the lines between chatbots and humans continue to blur.

"I don’t think a detector is long-term reliable," Jim Fan, an AI scientist at Nvidia who used to work at OpenAI and Google, told Fowler. "The AI will get better, and will write in ways more and more like humans."

More on AI detectors: We Tested OpenAI's New AI-Detector and Uhhhhh

Share This Article