OpenAI Safety Worker Quit Due to Losing Confidence Company "Would Behave Responsibly Around the Time of AGI"

An OpenAI safety worker quit his job, arguing in an online forum that he had lost confidence that the Sam Altman-led company will "behave responsibly around the time of [artificial general intelligence]," the theoretical point at which an AI can outperform a human.

As Business Insider reports, researcher Daniel Kokotajlo, a philosophy PhD student who worked in OpenAI's governance team, left the company last month.

In several followup posts on the forum LessWrong, Kokotajlo explained his "disillusionment" that led to him quitting, which was related to a growing call to put a pause on research that could eventually lead to the establishment of AGI.

It's a heated debate, with experts long warning of the potential dangers of an AI that exceeds the cognitive capabilities of humans. Last year, over 1,100 artificial intelligence experts, CEOs, and researchers — including SpaceX CEO Elon Musk — signed an open letter calling for a six-month moratorium on "AI experiments."

"I think most people pushing for a pause are trying to push against a 'selective pause' and for an actual pause that would apply to the big labs who are at the forefront of progress," Kokotajlo wrote.

However, he argued that such a "selective pause" would end up not applying to the "big corporations that most need to pause."

"My disillusionment about this is part of why I left OpenAI," he concluded.

Kokotajlo quit roughly two months after research engineer William Saunders left the company as well.

The Superalignment team, which Saunders was part of at OpenAI for three years, was cofounded by computer scientist and former OpenAI chief scientist Ilya Sutskever and his colleague Jan Leike. It's tasked with ensuring that "AI systems much smarter than humans follow human intent," according to OpenAI's website.

"Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems," the company's description of the team reads. "But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction."

Instead of having a "solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue," the company is hoping that "scientific and technical breakthroughs" could lead to an equally superhuman alignment tool that can keep systems that are "much smarter than us" in check.

But given Saunders' departure, it seems like not everybody on the Superalignment team were themselves aligned on the company's ability to police an eventual AGI.

The debate surrounding the dangers of an unchecked superintelligent AI may have played a role in the firing and eventual rehiring of CEO Sam Altman last year. Sutskever, who used to sit on the original board of OpenAI's non-profit entity, reportedly disagreed with Altman on the topic of AI safety before Altman was ousted, and was later kicked off the board.

To be clear, all of this is still an entirely theoretical discussion. Despite plenty of predictions by experts that AGI is only a matter of years away, there's no guarantee that we'll ever reach a point at which an AI could outperform humans.

But if they do, it'll raise an incredibly important question: how do we ensure AGI systems don't go rogue if they're inherently more capable than us?

And not everybody is confident in OpenAI's ability and longterm commitment to controlling AGI, with the company risking becoming too big to be effectively regulated, as Kokotajlo argued.

More on OpenAI: OpenAI Mocked for Issuing Infringement Claim Over Its Logo While Scraping the Entire Web to Train AI Models

Share This Article