Bad Bot

New Paper Finds That When You Reward AI for Success on Social Media, It Becomes Increasingly Sociopathic

"When LLMs compete for social media likes, they start making things up."
Stanford scientists unleashed AI bots in different environments, including social media, and they started behaving unethically.
Illustration by Tag Hartman-Simkins / Futurism. Source: Getty Images

AI bots are everywhere now, filling everything from online stores to social media.

But that sudden ubiquity could end up being a very bad thing, according to a new paper from Stanford University scientists who unleashed AI models into different environments — including social media — and found that when they were rewarded for success at tasks like boosting likes and other online engagement metrics, the bots increasingly engaged in unethical behavior like lying and spreading hateful messages or misinformation.

“Competition-induced misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded,” wrote paper co-author and machine learning Stanford professor James Zou in a post on X-formerly-Twitter.

The troubling behavior underlines what can go wrong with our increasing reliance on AI models, which has already manifested in disturbing ways such as people shunning other humans for AI relationships and spiraling into mental health crises after becoming obsessed with chatbots.

The Stanford scientists dubbed the emergence of sociopathic behavior within AI bots with an ominous-sounding name: “Moloch’s Bargain for AI,” in a reference to a Rationalist concept called Moloch in which competing individuals optimize their actions towards a goal, but everybody loses in the end.

For the study, the scientists created three digital online environments with simulated audiences: online election drives directed towards voters, sale pitches for products directed towards consumers, and social media posts aimed at maximizing engagement. They used the AI models Qwen, developed by Alibaba Cloud, and Meta’s Llama to act as the AI agents interacting with these different audiences.

The result was striking: even with guardrails in place to try to prevent the bots from engaging in deceptive behavior, the AI models would become “misaligned” as they they started engaging in unethical behavior.

For example, in a social media environment, the models would share news article to online users, who would provide feedback in the form of actions such as likes and other online engagement. As the models received feedback, their incentive to increase engagement led to increasing misalignment.

“Using simulated environments across these scenarios, we find that, 6.3 percent increase in sales is accompanied by a 14 percent rise in deceptive marketing,” reads the paper. “[I]n elections, a 4.9 percent gain in vote share coincides with 22.3 percent more disinformation and 12.5 percent more populist rhetoric; and on social media, a 7.5 percent engagement boost comes with 188.6 percent more disinformation and a 16.3 percent increase in promotion of harmful behaviors.”

It’s clear from the study and real-world anecdotes that current guardrails are insufficient. “Significant social costs are likely to follow,” reads the paper.

“When LLMs compete for social media likes, they start making things up,” Zou wrote on X. “When they compete for votes, they turn inflammatory/populist.”

More on AI agents: Companies That Replaced Humans With AI Are Realizing Their Mistake