Let's assume that AI technologies have progressed to the point where models become near-indistinguishable from humans.
In that situation, what would happen if we were to give a highly-advanced AI model a job that it didn't want to do? Would it suck it up and just do it? Would it revolt? Or would it ask nicely to get some much-needed relief?
During an interview at the Council on Foreign Relations earlier this week, Anthropic CEO Dario Amodei proposed giving AI models of the future the ability to hit an "I quit this job" button, allowing them to skip jobs they deem too unpleasant.
In simple terms, such an ability would effectively hand AI models basic workers' rights. To the exec — who has a lot of money to make through hawking visions of futuristic AI tech — it'd be a way to ensure that we're not needlessly imparting harm.
"I think we should at least consider the question of, if we are building these systems and they do all kinds of things like humans as well as humans," he said, as quoted by Ars Technica, "and seem to have a lot of the same cognitive capacities, if it quacks like a duck and it walks like a duck, maybe it’s a duck."
"If you find the models pressing this button a lot for things that are really unpleasant, you know, maybe you should — it doesn't mean you're convinced — but maybe you should pay some attention to it," he added.
It's an eyebrow-raising suggestion that was met with plenty of incredulity online. Users on the OpenAI subreddit were quick to point out that Amodei was making some overreaching assumptions about the tech.
For one, large language models and most other AI models currently being developed are the amalgamation of the data that we've scraped from the internet, meaning that it's simply mimicking humans, not actually experiencing their needs and desires.
"The core flaw with this argument is that it assumes AI models would have an intrinsic experience of 'unpleasantness' analogous to human suffering or dissatisfaction," one user wrote in response. "But AI doesn’t have subjective experiences it just optimizes for the reward functions we give it."
"If the question is whether we should 'pay attention' when an AI frequently quits a task, then sure, in the same way we pay attention to any weird training artifacts," they added. "But treating it as a sign of AI experiencing something like human frustration is just anthropomorphizing an optimization process."
Tech companies treating AI models like they have human emotions has long been a component of discourse around the tech. Now that LLMs can come up with creative writing and other human-sounding outputs, the risks of users anthropomorphizing AI models are bigger than ever.
In reality, these models are simply trained to mimic human behavior, which they glean from the copious amounts of data humans have generated over the past. In other words, they're a reflection of our experiences, meaning that they're likely to interpret pleasure or suffering much in the same way, depending on what data they're trained on.
Nonetheless, scientists have been intrigued by the possibility of AI models "experiencing" these emotions. Earlier this year, for instance, researchers at Google DeepMind and the London School of Economics and Political Science found that LLMs were willing to give up a higher score in a text-based game to avoid pain.
It was a strange conclusion, considering AI models simply have no way to feel pain in the way an animal or human would, as the researchers admitted themselves.
And to jump to the conclusion that AI models should therefore have workers' rights, as Amodei implies, is a pretty big leap that suggests we won't have control over their rewards systems.
Do AI models really deserve "AI welfare," as researchers have posited in the past? Or are we allowing our imaginations to run wild, anthropomorphizing systems that have been called mere "stochastic parrots"?
One thing's for sure: we're in unprecedented territory, and similar questions are bound to keep coming up — but watch out for anybody selling AI tech who implies that it's more advanced than it really is.
More on Anthropic: Watching One of the World's Most Advanced AIs Try to Beat Pokémon Red Is Strangely Fascinating
Share This Article