What role should text-generating large language models (LLMs) have in the scientific research process? According to a team of Oxford scientists, the answer — at least for now — is: pretty much none.

In a new essay, researchers from the Oxford Internet Institute argue that scientists should abstain from using LLM-powered tools like chatbots to assist in scientific research on the grounds that AI's penchant for hallucinating and fabricating facts, combined with the human tendency to anthropomorphize the human-mimicking word engines, could lead to larger information breakdowns — a fate that could ultimately threaten the fabric of science itself.

"Our tendency to anthropomorphize machines and trust models as human-like truth-tellers, consuming and spreading the bad information that they produce in the process," the researchers write in the essay, which was published this week in the journal Nature Human Behavior, "is uniquely worrying for the future of science."

The scientists' argument hinges on the reality that LLMs and the many bots that the technology powers aren't primarily designed to be truthful. As they write in the essay, sounding truthful is but "one element by which the usefulness of these systems is measured." Characteristics including "helpfulness, harmlessness, technical efficiency, profitability, [and] customer adoption" matter, too.

"LLMs are designed to produce helpful and convincing responses," they continue, "without any overriding guarantees regarding their accuracy or alignment with fact."

Put simply, if a large language model — which, above all else, is taught to be convincing — comes up with an answer that's persuasive but not necessarily factual, the fact that the output is persuasive will override its inaccuracy. In an AI's proverbial brain, simply saying "I don't know" is less helpful than providing an incorrect response.

But as the Oxford researchers lay out, AI's hallucination problem is only half the problem. The Eliza Effect, or the human tendency to read way too far into human-sounding AI outputs due to our deeply mortal proclivity to anthropomorphize everything around us, is a well-documented phenomenon. Because of this effect, we're already primed to put a little too much trust in AI; couple that with the confident tone these chatbots so often take, and you have a perfect recipe for misinformation. After all, when a human gives us a perfectly bottled, expert-sounding paraphrasing in response to a query, we're probably less inclined to use the same critical thinking in our fact-checking as we might when we're doing our own research.

Importantly, the scientists do note "zero-shot translation" as a scenario in which AI outputs might be a bit more reliable. This, as Oxford professor and AI ethicist Brent Mittelstadt told EuroNews, refers to when a model is given "a set of inputs that contain some reliable information or data, plus some request to do something with that data."

"It's called zero-shot translation because the model has not been trained specifically to deal with that type of prompt," Mittelstadt added. So, in other words, a model is more or less rearranging and parsing through a very limited, trustworthy dataset, and not being used as a vast, internet-like knowledge center. But that would certainly limit its use cases, and would demand a more specialized understanding of AI tech — much different from just loading up ChatGPT and firing off some research questions.

And elsewhere, the researchers argue, there's an ideological battle at the core of this automation debate. After all, science is a deeply human pursuit. To outsource too much of the scientific process to automated AI labor, the Oxforders say, could undermine that deep-rooted humanity. And is that something we can really afford to lose?

"Do we actually want to reduce opportunities for writing, thinking critically, creating new ideas and hypotheses, grappling with the intricacies of theory and combining knowledge in creative and unprecedented ways?" the researchers write. "These are the inherently valuable hallmarks of curiosity-driven science."

"They are not something that should be cheaply delegated to incredibly impressive machines," they continue, "that remain incapable of distinguishing fact from fiction."

More on people using AI tools where they definitely shouldn't: Lawyer Fired for Using ChatGPT Says He Will Keep Using AI Tools

Share This Article