If you believe we're about to reach a point where AI chatbots are just as capable of learning how to complete intellectual tasks as humans, you might want to think again.
In a new yet-to-be-peer-reviewed paper, a team of Stanford scientists, argue the glimmers of artificial general intelligence (AGI) we're seeing are all just an illusion.
Across the board, AI companies have been making big claims about their respective large language model-powered AIs and their "emergent" behavior, or showing early signs of AGI.
Earlier this year, a team of Microsoft researchers claimed that an early version of GPT-4 showed "sparks" of AGI. Then, a Google exec claimed that the company's Bard chatbot had magically learned to translate Bengali without receiving the necessary training.
But are we really approaching a point where machines are able to compete with us on an intellectual level? In their new paper, the Stanford researchers argue that any seemingly emergent abilities of LLMs may just be "mirages" borne out of inherently flawed metrics.
As they posit in their new paper, the folks claiming to be seeing emergent behaviors are consistently comparing large models, which generally have more capabilities simply due to their sheer size, to smaller models — which are inherently less capable.
They're also using wildly specific metrics to measure emergence, the researchers argue.
But when more data and less specific metrics are brought into the picture, these seemingly unpredictable properties become quite predictable — and thus, effectively negate their outlandish claims.
The researchers argue that "existing claims of emergent abilities are creations of the researcher's analyses, not fundamental changes in model behavior on specific tasks with scale."
In other words, when you use unpredictable metrics, you get unpredictable results.
To illustrate how questionable some of the metrics that have been used to declare the emergence of AGI are, the Stanford researchers used a helpful baseball analogy.
"Imagine evaluating baseball players based on their ability to hit a baseball a certain distance," the paper reads. "If we use a metric like 'average distance' for each player, the distribution of players' scores will likely appear smooth and continuous. However, if we opt for a discontinuous metric like 'whether a player's average distance exceeds 325 feet,' then many players will score 0, while only the best players will score one."
"Both metrics are valid," they add, "but it's important not to be surprised when the latter metric yields a discontinuous outcome."
It's probably fair to say that this phenomenon can be attributed, at least in part, to some AI scientists seeing what they want to see in their machines. After all, the idea that your tech might have developed an emergent property is an alluring one.
To that end, though, these emergent properties, whether they exist or not, are more than just exciting, as they also come with some worrying ramifications. Once a machine shows even one emergent property, does that mean we've officially lost control?
And all of this, the excitement and fear both, undoubtedly plays into money-driving hype around the technology — meaning that claiming sparks of AGI by way of claiming emergence isn't exactly bad for marketing purposes.
The term AGI has been thrown around a lot in recent months, including by those who have a lot to gain financially as a result of doing just that.
In short, claiming we're seeing "glimmers of AGI" is potentially playing into OpenAI's AGI narratives, bolstering their efforts to maximize profits.
If there's one thing to take away, it's that we should take claims of AGI with a big grain of salt. Check for any discontinuous metrics if you feel so inclined, and while you're at it, you might want to take a quick peek at who, exactly, the people making these claims work for.
It certainly wouldn't hurt.
Share This Article