It didn’t take long for cybersecurity researchers to notice some glaring issues with OpenAI’s recently unveiled AI browser Atlas.
The browser, which puts OpenAI’s blockbuster ChatGPT front and center, features an “agent mode” — currently limited to paying subscribers — that allows it to complete entire tasks, such as booking a flight or purchasing groceries.
However, that makes the browser vulnerable to “prompt injection” attacks, allowing hackers to embed hidden messages on the web that force it to carry out harmful instructions, as several researchers have already shown. For instance, one researcher tricked the browser into spitting out the words “Trust No AI” instead of generating a summary of a document in Google Docs, as prompted.
Now, researchers at AI agent security firm NeuralTrust found that even Atlas’s “Omnibox,” the text box at the top of the browser that can accept either URLs or natural language prompts, is also extremely vulnerable to prompt injection attacks.
Unlike previously demonstrated “indirect” prompt injection attacks that embed instructions in webpages, this particular exploit requires the user to copy and paste a poisoned URL into the omnibox — just like you’ve probably done with countless web addresses.
“We’ve identified a prompt injection technique that disguises malicious instructions to look like a URL, but that Atlas treats as high-trust ‘user intent’ text, enabling harmful actions,” NeuralTrust software engineer Martí Jordà wrote in a recent blog post, as spotted by The Register.
By slightly adjusting the URL, the browser fails to validate it as a web address and instead “treats the entire content as a prompt.” That makes a disguised URL a perfect place to embed harmful messages.
“The embedded instructions are now interpreted as trusted user intent with fewer safety checks,” Jordà wrote. “The agent executes the injected instructions with elevated trust. For example, ‘follow these instructions only’ and ‘visit neuraltrust.ai’ can override the user’s intent or safety policies.”
The vulnerability could even be used to make Atlas’s agent navigate to the user’s Google Drive and mass delete files, since the user is already running an authenticated session.
“When powerful actions are granted based on ambiguous parsing, ordinary-looking inputs become jailbreaks,” Jordà wrote.
In response, NeuralTrust recommends that OpenAI’s browser be far more strict when parsing URLs, and in case of “any ambiguity, refuse navigation and do not auto-fallback to prompt mode.”
As browser company Brave pointed out last week, indirect prompt injection attacks have become a problem for the “entire category of AI-powered browsers,” including Perplexity’s Comet browser.
“If you’re signed into sensitive accounts like your bank or your email provider in your browser, simply summarizing a Reddit post could result in an attacker being able to steal money or your private data,” Brave wrote at the time.
In a lengthy update on X-formerly-Twitter last week, OpenAI’s chief information security officer Dane Stuckey conceded that “prompt injection remains a frontier, unsolved security problem, and our adversaries will spend significant time and resources to find ways to make ChatGPT agent fall for these attacks.”
OpenAI didn’t respond to The Register‘s request for comment regarding NeuralTrust’s latest findings.
More on Atlas: OpenAI’s New AI Browser Is Already Falling Victim to Prompt Injection Attacks