There's a Problem With That App That Detects GPT-Written Text: It's Not Very Accurate

Princeton University computer science student Edward Tian has earned a storm of media attention — by CBS, NPR, NBC and many other outlets — for an app he built that attempts to detect whether a given text was produced by OpenAI's ChatGPT text generator.

Tian says his app, GPTZero, is meant to "quickly and efficiently detect whether an essay is ChatGPT or human written," in a response to a rise in AI plagiarism.

"Think are high school teachers going to want students using ChatGPT to write their history essays?" the 22-year-old student tweeted earlier this month. "Likely not."

Tian is right that tools like ChatGPT pose a profound challenge for educators, who fear that students will soon start — or are already — using the app to generate essays for class. The media was quick to bite on that narrative.

"Teachers worried about students turning in essays written by a popular artificial intelligence chatbot now have a new tool of their own," NPR gushed.

In spite of the storm of breathless coverage, though, our testing found that while GPTZero does accurately identify whether text was generated by ChatGPT more accurately than if it was just randomly guessing, it's also often wrong. And when you're talking about allegations of educational misconduct — plagiarism is grounds for a failing grade or even expulsion at many academic institutions — that's not good enough.

We fed it a total of sixteen pieces of text, each at least 300 words in length, eight pulled from our own archives and eight generated by ChatGPT.

The numbers speak for themselves. GPTZero correctly identified the ChatGPT text in seven out of eight attempts and the human writing six out of eight times.

Don't get us wrong: those results are impressive. But they also indicate that if a teacher or professor tried using the tool to bust students doing coursework with ChatGPT, they would end up falsely accusing nearly 20 percent of them of academic misconduct.

Tian himself — who didn't respond to questions for this story — seems more aware of the shortcomings of his app than the media covering it, and says he's actively working on improving his app's accuracy.

"We're still studying implicit bias in [language model] generated text right now," Tian tweeted, "so hopefully will be adding a few more tests and factors to improve the model."

Tian's app gauges a given text's "perplexity," which he defines as the "randomness of a text to a model, or how well a language model likes a text," as well as its "burstiness," or how a text's perplexity changes over time, to make its conclusion.

"Machine written text exhibits more uniform and constant perplexity over time, while human written text varies," he said.

Results notwithstanding, fears of ChatGPT's effects on the education ecosystem aren't unwarranted.

"I would have given this a good grade," Dan Gillmor, a journalism professor at Arizona State University, who asked ChatGPT to complete a common assignment he gives his students, told The Guardian last month. "Academia has some very serious issues to confront."

In the face of those fears of the rapidly growing powers of AI, it's tempting to seize on the narrative that some brilliant coder has discovered an easy hack to sort out AI-generated text from that written by a human.

And while that might eventually happen, it's probably more likely that we'll see game of cat and mouse, with tools like Tian's analyzing outputs and determining a probability that a given output was created by an AI. A perfect AI-catching solution that works 100 percent of the time could prove incredibly difficult, especially as the tech continues to mature.

Where that leaves the future of the technology, particularly when it comes to students using language models like ChatGPT to generate essays, remains to be seen.

Nonetheless, educators are watching warily as AI tools are starting to creep into classrooms and are trying to get ahead of the problem. OpenAI's tool was recently banned from all schools in New York City, a policy change that could have knock-on effects in other parts of the country.

At the same time, not everybody is convinced that ChatGPT will spell the end of the college essay, especially for assignments that require heavy analysis or detailed research.

"Every year or two, there's something that's ostensibly going to take down higher education as we know it," Pennsylvania State English professor Stuart Selber told Insider. "So far, that hasn't happened."

More on ChatGPT: ChatGPT Officially Banned from NYC Schools

Share This Article