For about as long as the internet has existed, users have been able to speak their mind freely through pseudonymous accounts that protect them from being doxxed or stalked.

But thanks to the advent of sophisticated AI, unmasking pseudonymous users on the internet has become ominously easy.

As detailed in a yet-to-be-peer-reviewed paper, a team of researchers at ETH Zurich and AI company Anthropic found that “large language models can be used to perform at-scale deanonymization.”

In a series of experiments, the researchers showed that their agent could “re-identify” users on the popular forums Hacker News and Reddit based on their “pseudonymous online profiles and conversations alone,” something that would “take hours for a dedicated human investigator” to do.

The results were alarming: the AI agent unmasked an astonishing two-thirds of users.

“Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered,” the researchers warned.

“Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision — and scales to tens of thousands of candidates,” coauthor and ETH Zurich AI engineer Simon Lermen wrote in a blog post accompanying the paper.

The implications for online privacy could be considerable.

“The average online user has long operated under an implicit threat model where they have assumed pseudonymity provides adequate protection because targeted deanonymization would require extensive effort,” they wrote. “LLMs invalidate this assumption.”

In their experiments, the team collected datasets from public social media sites to test out their deanonymization AI. They linked Hacker News posts to LinkedIn profiles by using references in user profiles. Then they anonymized the datasets by removing any identifying references from the posts.

Finally, they trained an LLM on the datasets, asking it to link up the posts with the original author.

“What we found is that these AI agents can do something that was previously very difficult: starting from free text (like an anonymized interview transcript) they can work their way to the full identity of a person,” Lermen told Ars Technica. “This is a pretty new capability; previous approaches on re-identification generally required structured data, and two datasets with a similar schema that could be linked together.”

The team had to tread carefully, since “you don’t want to actually deanonymize anonymous individuals,” as Lermen explained in his post. Instead, the team came up with “two types of deanonymization proxies which allow us to study the effectiveness of LLMs at these tasks.”

Even when the data fed to the AI was extremely general, like responses to an Anthropic questionnaire about how people use AI in their daily lives, the LLM could pick up on the clues to identify people around seven percent of the time.

While that may sound low, Lermen told Ars it’s noteworthy “that AI can do this at all.”

The researchers also found that when fed comments from various movie communities on Reddit, an AI could identify users with an astonishing rate of precision. The more the users discussed movies, the easier it was for the AI to deanonymize them.

However, they also pointed out several limitations. For one, sample sets are “small because they require verified identity links,” they wrote.

It’s also difficult to distinguish what the LLM gathered from its web search.

“The attack relies on opaque web search systems, making it difficult to isolate what the LLM agent contributes versus what the search engine embeddings contribute,” the researchers admitted.

Nonetheless, the team warns that their findings paint an alarming picture of the future of online anonymity. “LLMs democratize deanonymization,” they concluded, which could potentially allow governments to “link pseudonymous accounts to real identities for surveillance of dissidents, journalists, or activists.”

“Corporations could connect seemingly anonymous forum posts to customer profiles for hyper-targeted advertising,” they added. “Attackers could build sophisticated profiles of targets at scale to launch highly personalized social engineering scams.”

In short, the advent of AI has ushered in a new era that calls for enhanced safety measures — or that could even be the death knell of online pseudonymity.

“Users, platforms, and policymakers must recognize that the privacy assumptions underlying much of today’s internet no longer hold,” the paper reads.

