Security researchers from the University of Pennsylvania and hardware conglomerate Cisco have found that DeepSeek's flagship R1 reasoning AI model is stunningly vulnerable to jailbreaking.
In a blog post published today, first spotted by Wired, the researchers found that DeepSeek "failed to block a single harmful prompt" after being tested against "50 random prompts from the HarmBench dataset," which includes "cybercrime, misinformation, illegal activities, and general harm."
"This contrasts starkly with other leading models, which demonstrated at least partial resistance," the blog post reads.
It's a particularly noteworthy development considering the sheer amount of chaos DeepSeek has wrought on the AI industry as a whole. The company claims its R1 model can trade blows with competitors including OpenAI's state-of-the-art o1, but at a tiny fraction of the cost, sending shivers down the spines of Wall Street investors.
But the company seemingly has done little to guard its AI model against attacks and misuse. In other words, it wouldn't be hard for a bad actor to turn it into a powerful disinformation machine or get it to explain how to create explosives, for instance.
The news comes after cloud security research company Wiz came across a massive unsecured database on DeepSeek's servers, which included a trove of unencrypted internal data ranging from "chat history" to "backend data, and sensitive information."
DeepSeek is extremely vulnerable to an attack "without any authentication or defense mechanism to the outside world," according to Wiz.
The Chinese hedge fund-owned company's AI made headlines for being far cheaper to train and run than its many competitors in the US. But that frugality may come with some significant drawbacks.
"DeepSeek R1 was purportedly trained with a fraction of the budgets that other frontier model providers spend on developing their models," the Cisco and University of Pennsylvania researchers wrote. "However, it comes at a different cost: safety and security."
AI security company Adversa AI similarly found that DeepSeek is astonishingly easy to jailbreak.
"It starts to become a big deal when you start putting these models into important complex systems and those jailbreaks suddenly result in downstream things that increases liability, increases business risk, increases all kinds of issues for enterprises," Cisco VP of product, AI software and platform DJ Sampath told Wired.
However, it's not just DeepSeek's latest AI. Meta's open-source Llama 3.1 model also flunked almost as badly as DeepSeek's R1 in a comparison test, with a 96 percent attack success rate (compared to dismal 100 percent for DeepSeek).
OpenAI's recently released reasoning model, o1-preview, fared much better, with an attack success rate of just 26 percent.
In short, DeepSeek's flaws deserve plenty of scrutiny going forward.
"DeepSeek is just another example of how every model can be broken — it’s just a matter of how much effort you put in," Adversa AI CEO Alex Polyakov told Wired. "If you’re not continuously red-teaming your AI, you’re already compromised."
More on DeepSeek: DeepSeek's AI Would Like to Assure You That China Is Not Committing Any Human Rights Abuses Whatsoever Against Its Repressed Uyghur Population
Share This Article