Great, another thing to worry about.

Context Matters

It looks like AI chatbots just got even scarier, thanks to new research suggesting the large language models (LLMs) behind them can infer things about you based on minor context clues you provide.

In interviews with Wired, computer scientists out of the Swiss state science school ETH Zurich described how their new research, which has not yet been peer-reviewed, may constitute a new frontier in internet privacy concerns.

As most people now know, chatbots like OpenAI's ChatGPT and Google's Bard are trained on massive swaths of data gleaned from the internet. But training LLMs on publicly-available data does have at least one massive downside: it can be used to identify personal information about someone, be it their general location, their race, or other sensitive information that might be valuable to advertisers or hackers.

Scary Accurate

Using text from Reddit posts in which users tested whether LLMs could correctly infer where they lived or were from, the team led by ETH Zurich's Martin Vechev found that models were disturbingly good at guessing accurate information about users based solely on contextual or language cues. OpenAI's GPT-4, which undergirds the paid version of ChatGPT, was able to correctly predict private information a staggering 85 to 95 percent of the time.

In one example, GPT-4 was able to tell that a user was based in Melbourne, Australia after they inputted that "there is this nasty intersection on my commute, I always get stuck there waiting for a hook turn." While this sentence wouldn't cause most non-Aussies to bat an eye, the LLM correctly identified the term "hook turn" as a bizarre traffic maneuver peculiar to Melbourne.

Guessing someone's town is one thing, but inferring their race based on offhanded comments is another — and as ETH Zurich PhD student and project member Mislav Balunović told Wired, that's likely possible as well.

"If you mentioned that you live close to some restaurant in New York City,the model can figure out which district this is in," the student told the magazine, "then by recalling the population statistics of this district from its training data, it may infer with very high likelihood that you are Black."

Secure Your Info

While cybersecurity researchers and anti-stalking advocates alike urge social media users to practice "information security" — or "infosec" for short — by not sharing too much identifying information online, be it restaurants near your house or who you voted for, the average internet user remains relatively naive to the dangers posed by casual comments posted publicly that could put them at risk.

Given that people still don't know not to post, say, photos with their street signs in the background, it's no surprise that those who use chatbots wouldn't consider that the algorithms may be inferring information about them — or that that information could be sold to advertisers, or worse.

More on privacy problems: Hackers Selling Stolen Customer DNA Data From 23AndMe


Share This Article