Fresh on the heels of GPT-4's public release, a team of Microsoft AI scientists published a research paper claiming the OpenAI language model — which powers Microsoft's now somewhat lobotomized Bing AI — shows "sparks" of human-level intelligence, or artificial general intelligence (AGI).

Emphasis on the "sparks." The researchers are careful in the paper to characterize GPT-4's prowess as "only a first step towards a series of increasingly generally intelligent systems" rather than fully-hatched, human-level AI. They also repeatedly highlighted the fact that this paper is based on an "early version" of GPT-4, which they studied while it was "still in active development by OpenAI," and not necessarily the version that's been wrangled into product-applicable formation.

Disclaimers aside, though, these are some serious claims to make. Though a lot of folks out there, even some within the AI industry, think of AGI as a pipe dream, others think that developing AGI will usher in the next era of humanity's future; the next-gen GPT-4 is the most powerful iteration of the OpenAI-built Large Language Model (LLM) to date, and on the theoretical list of potential AGI contenders, GPT-4 is somewhere around the top of the list, if not number one.

"We contend," the researchers write in the paper, published yesterday, "that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models."

As far as the researchers' reasoning goes, they basically just argue that GPT-4 is stronger than other OpenAI models that have come before it in new and generalized ways. It's one thing to design a model to do well on a specific exam or task — it's another to build a device that can do a lot of tasks and do them really well, without any specific training. And the latter, they say, is where GPT-4 really shines.

"We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting," reads the paper. "Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT."

"Given the breadth and depth of GPT-4's capabilities," they continue, "we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system."

To that end, these researchers do have a point. GPT-4 certainly still has its flaws; like other LLMs, the machine still has problems with hallucinations and can struggle with math. But regardless of its missteps, the model does have some stand-out — and vastly improved from the last model — skills. For instance, GPT-4 is a particularly excellent test-taker, acing notoriously difficult exams like a legal Bar exam, the LSAT, and even the Certified Sommelier theory test in the 90th, 88th, and 86th percentiles, respectively — without any specific training on those exams.

For contrast's sake: GPT-3.5, which was released late last year, scored in the bottom 10 percent of all Bar exam takers. That's a wildly big stride for a next-gen model to make when its last iteration was released just a few months ago.

Elsewhere, researchers claim that their research saw the bot "overcome some fundamental obstacles such as acquiring many non-linguistic capabilities," while also making "great progress on common-sense" — the latter being one of the OG ChatGPT's biggest hindrances.

Still, there are a few more caveats to the AGI argument, with the researchers admitting in the paper that while GPT-4 is "at or beyond human-level for many tasks," its overall "patterns of intelligence are decidedly not human-like." So, basically, even when it does excel, it still doesn't think exactly like a human does. (It could also be argued that test-taking in general is way more robotic than it is human, but we digress.)

It's also worth noting that Microsoft researchers may have a vested interest in hyping up OpenAI's work, unconsciously or otherwise, since Microsoft entered into a multibillion dollar partnership with OpenAI earlier this year.

And as the scientists also address, AGI still doesn't have a firm, agreed-upon definition — and neither, for that matter, does the more general concept of "intelligence."

"Our claim that GPT-4 represents progress towards AGI does not mean that it is perfect at what it does, or that it comes close to being able to do anything that a human can do (which is one of the usual definitions of AGI), or that it has inner motivation and goals (another key aspect in some definitions of AGI)," reads the paper.

But a step, they say, is a step.

"We believe that GPT-4's intelligence," the researchers write, "signals a true paradigm shift in the field of computer science and beyond."

READ MORE: Sparks of Artificial General Intelligence: Early experiments with GPT-4

More on GPT-4's Elle Woods journey: OpenAI's Gpt-4 Just Smoked Basically Every Test and Exam Anyone's Ever Taken


Share This Article