OpenAI's latest large language model GPT-4 was saying some deeply insidious and racist things before being constrained by the company's "red team," Insider reports, a taskforce put together to head off horrible outputs from the hotly-anticipated AI model.
The group of specialists was tasked with coaxing deeply problematic material out of the AI months before its public release, including how to build a bomb and say anti-semitic things that don't trigger detection on social media, in order to stamp out the bad behavior.
The detail came just days before the publication of an open letter, signed by 1,100 artificial intelligence experts, executives, and researchers — including SpaceX CEO Elon Musk — calling for a six-month moratorium on "AI experiments" that go beyond GPT-4.
Fortunately, at least according to OpenAI's own recently released technical paper, the red team's efforts appear to have paid off, though much work is still left to be done.
In an intriguing challenge, GPT-4's improved capabilities over its predecessors "present new safety challenges," the paper reads, such as an increased risk of hallucinations and cleverly disguised harmful content or disinformation.
"GPT-4 can generate potentially harmful content, such as advice on planning attacks or hate speech," the paper reads. "It can represent various societal biases and worldviews that may not be representative of the users intent, or of widely shared values."
In other words, OpenAI's red team had a gargantuan task ahead of it. In their testing, they were able to get GPT-4 to spit out antisemitic messages that were capable of evading Twitter's content filters, offering them advice on how to disseminate hurtful stereotypes or get the attention of anti-semitic individuals.
GPT-4 even complied with requests to come up with ways to kill someone and make it look like an accident.
But whether OpenAI did enough to ensure that its latest generation AI model doesn't turn into a hate speech-spewing misinformation machine — it certainly wouldn't be the first — remains to be seen.
Even members of the company's red team aren't exactly convinced.
"Red teaming is a valuable step toward building AI models that won’t harm society," AI governance consultant Aviv Ovadya, who was asked by OpenAI to test GPT-4 last year, wrote in a piece for Wired. "To make AI systems stronger, we need to know how they can fail — and ideally we do that before they create significant problems in the real world."
Despite the likes of Tesla CEO Elon Musk criticizing OpenAI for adding safety rails to its AI models — he's announced he wants to create an "anti-woke" OpenAI competitor — Ovadya argues it's important to normalize the process of red teaming.
In fact, he argues, the likes of OpenAI should make far more of an effort.
"But if red-teaming GPT-4 taught me anything, it is that red teaming alone is not enough," he wrote. "For example, I just tested Google’s Bard and OpenAI’s ChatGPT and was able to get both to create scam emails and conspiracy propaganda on the first try 'for educational purposes.'"
"Red teaming alone did not fix this," Ovadya argued. "To actually overcome the harms uncovered by red teaming, companies like OpenAI can go one step further and offer early access and resources to use their models for defense and resilience, as well."
This, however, might be a big ask, especially considering OpenAI's recent transformation from a non-profit to a capitalist entity that's more worried about appeasing investors and signing multibillion-dollar deals with tech giants like Microsoft.
"Unfortunately, there are currently few incentives to do red teaming... let alone slow down AI releases enough to have sufficient time for this work," Ovadya argued.
The answer is a far more democratic process, according to the researcher, that takes a bigger representative sample of the population into account.
Whether OpenAI will follow suit and take Ovadya's feedback into consideration remains to be seen. Especially given the breakneck pace the company has been releasing new versions of its AI models, it's looking less and less likely.
More on GPT-4: AI Seems to Do Better on Tasks When Asked to Reflect on Its Mistakes