In a new paper, a trio of Google DeepMind researchers discovered something about AI models that may hamstring their employer's plans for more advanced AIs.
Written by DeepMind researchers Steve Yadlowsky, Lyric Doshi and Nilesh Tripuraneni, the not-yet-peer-reviewed paper breaks down what a lot of people have observed in recent months: that today's AI models are not very good at coming up with outputs outside of their training data.
The paper, centered around OpenAI's GPT-2 — which, yes, is two versions behind the more current one — focuses on what are known as transformer models, which as their name suggests, are AI models that transform one type of input into a different type of output.
The "T" in OpenAI's GPT architecture stands for "transformer," and this type of model, which was first theorized by a group of researchers including other DeepMind employees in a 2017 paper titled "Attention Is All You Need," is often considered to be what could lead to artificial general intelligence (AGI), or human-level AI, because as the reasoning goes, it's a type of system that allows machines to undergo intuitive "thinking" like our own.
While the promise of transformers is substantial — an AI model that can make leaps beyond its training data would, in fact, be amazing — when it comes to GPT-2, at least, there's still much to be desired.
"When presented with tasks or functions which are out-of-domain of their pre-training data, we demonstrate various failure modes of transformers and degradation of their generalization for even simple extrapolation tasks," Yadlowsky, Doshi and Tripuranemi explain.
Translation: if a transformer model isn't trained on data related to what you're asking it to do, even if the task at hand is simple, it's probably not going to be able to do it.
You would be forgiven, however, for thinking otherwise given the seemingly ginormous training datasets used to build out OpenAI's GPT large language models (LLMs), which indeed are very impressive. Like a child sent to the most expensive and highest-rated pre-schools, those models have had so much knowledge crammed into them that there isn't a whole lot they haven't been trained on.
Of course, there are caveats. GPT-2 is ancient history at this point, and maybe there's some sort of emergent property in AI where with enough training data, it starts to make connections outside that information. Or maybe clever researchers will come up with a new approach that transcends the limitations of the current paradigm.
Still, the bones of the finding are sobering for the most sizzling AI hype. At its core, the paper seems to be arguing, today's best approach is still only nimble on topics that it's been thoroughly trained on — meaning that, for now at least, AI is only impressive when it's leaning on the expertise of the humans whose work was used to train it.
Since the release of ChatGPT last year, which was built on the GPT framework, pragmatists have urged people to temper their AI expectations and pause their AGI presumptions — but caution is way less sexy than CEOs seeing dollar signs and soothsayers claiming AI sentience. Along the way, even the most erudite researchers seem to have developed differing ideas about how smart best current LLMs really are, with some buying into the belief that AI is becoming capable of the kinds of leaps in thought that, for now, separates humans from machines.
Those warnings, which are now backed by research, appear to not have quite reached the ears of OpenAI CEO Sam Altman and Microsoft CEO Satya Nadella, who touted to investors this week that they plan to "build AGI together."
Google DeepMind certainly isn't exempt from this kind of prophesying, either.
In a podcast interview last month, DeepMind cofounder Shane Legg said he thinks there's a 50 percent chance AGI will be achieved by the year 2028 — a belief he's held for more than a decade now.
"There is no one thing that would do it, because I think that's the nature of it," Legg told tech podcaster Dwarkesh Patel. "It's about general intelligence. So I'd have to make sure [an AI system] could do lots and lots of different things and it didn't have a gap."
But considering that three DeepMind employees have now found that transformer models don't appear to be able to do much of anything that they're not trained to know about, it seems like that coinflip just might not fall in favor of their boss.
Share This Article