Top AI Researchers Concerned They’re Losing the Ability to Understand What They’ve Created

Researchers from OpenAI, Google DeepMind, and Meta have joined forces to warn about what they're building.

In a new position paper, 40 researchers spread across those four companies called for more investigation of AI powered by so-called "chains-of-thought" (CoT), the "thinking out loud" process that advanced "reasoning" models — the current vanguard of consumer-facing AI — use when they're working through a query.

As those researchers acknowledge, CoTs add a certain transparency into the inner workings of AI, allowing users to see "intent to misbehave" or get stuff wrong as it happens. Still, there is "no guarantee that the current degree of visibility will persist," especially as models continue to advance.

Depending on how they're trained, advanced models may no longer, the paper suggests, "need to verbalize any of their thoughts, and would thus lose the safety advantages." There's also the non-zero chance that models could intentionally "obfuscate" their CoTs after realizing that they're being watched, the researchers noted — and as we've already seen, AI has indeed rapidly become very good at lying and deception.

To make sure this valuable visibility continues, the cross-company consortium is calling on developers to start figuring out what makes CoTs "monitorable," or what makes the models think out loud the way they do. In this request, those same researchers seem to be admitting something stark: that nobody is entirely sure why the models are "thinking" this way, or how long they will continue to do so.

Zooming out from the technical details, it's worth taking a moment to consider how strange this situation is. Top researchers in an emerging field are warning that they don't quite understand how their creation works, and lack confidence in their ability to control it going forward, even as they forge ahead making it stronger; there's no clear precedent in the history of innovation, even looking back to civilization-shifting inventions like atomic energy and the combustion engine.

In an interview with TechCrunch about the paper, OpenAI research scientist and paper coauthor Bowen Baker explained how he sees the situation.

"We're at this critical time where we have this new chain-of-thought thing," Baker told the website. "It seems pretty useful, but it could go away in a few years if people don’t really concentrate on it."

"Publishing a position paper like this, to me, is a mechanism to get more research and attention on this topic," he continued, "before that happens."

Once again, there appears to be tacit acknowledgement of AI's "black box" nature — and to be fair, even CEOs like OpenAI's Sam Altman and Anthropic's Dario Amodei have admitted that at a deep level, they don't really understand how the technology they're building works.

Along with its 40-researcher author list that includes DeepMind cofounder Shane Legg and xAI safety advisor Dan Hendrycks, the paper has also drawn endorsements from industry luminaries including former OpenAI chief scientist Ilya Sutskever and AI godfather and Nobel laureate Geoffrey Hinton.

Though Musk's name doesn't appear on the paper, with Hendrycks on board, all of the "Big Five" firms — OpenAI, Google, Anthropic, Meta, and xAI — have been brought together to warn about what might happen when and if AI stops showing its work.

In doing so, that powerful cabal has said the quiet part out loud: that they don't feel entirely in control of AI's future. For companies with untold billions between them, that's a pretty strange message to market — which makes the paper all the more remarkable.

More on AI warnings: Bernie Sanders Issues Warning About How AI Is Really Being Used

Share This Article