Microsoft’s AI Team Accidentally Leaks Terabytes of Company Data

Microsoft AI researchers accidentally leaked 38 terabytes of confidential company data — all because of one misconfigured permission token. — *Image: Getty / Futurism*

Uh Oh

“Oops” doesn’t even cover this one.

Microsoft AI researchers accidentally leaked a staggering 38 terabytes — yes, terabytes — of confidential company data on the developer site GitHub, a new report from cloud security company Wiz has revealed.

The scope of the data spill is extensive, to say the least. Per the report, the leaked files contained a full disc backup of two employees’ workstations, which included sensitive personal data along with company “secrets, private keys, passwords, and over 30,000 internal Microsoft Teams messages.”

Worse yet, the leak could have even made Microsoft’s AI systems vulnerable to cyberattacks.

In short, it’s a huge mess — and somehow, it all goes back to one misconfigured URL, a reminder that human error can have some devastating consequences, particularly in the burgeoning world of AI tech.

We found a public AI repo on GitHub, exposing over 38TB of private files – including personal computer backups of @Microsoft employees 👨‍💻

How did it happen? 👀
A single misconfigured token in @Azure Storage is all it takes 🧵⬇️ pic.twitter.com/ZWMRk3XK6X
— Hillai Ben-Sasson (@hillai) September 18, 2023

Treasure Trove

According to Wiz, the mistake was made when Microsoft AI researchers were attempting to publish a “bucket of open-source training material” and “AI models for image recognition” to the developer platform.

The researchers miswrote the files’ accompanying SAS token, or the storage URL that establishes file permissions. Basically, instead of granting GitHub users access to the downloadable AI material specifically, the butchered token allowed general access to the entire storage account.

And we’re not just talking read-only permissions. The mistake actually granted “full control” access, meaning that anyone who might have wanted to tinker with the many terabytes of data — including that of the AI training material and AI models included in the pile — would have been able to.

An “attacker could have injected malicious code into all the AI models in this storage account,” Wiz’s researchers write, “and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.”

The Wiz report also notes that the SAS misconfiguration dates back to 2020, meaning that this sensitive material has basically been open-season for several years.

Bad Week

Microsoft says that it’s since resolved the issue, writing in a Monday blog post that no customer data was exposed in the leak.

Regardless, this is shaping up to be a terrible week for the Silicon Valley giant, as reports revealed this morning that yet another Microsoft leak — this one related to the company’s ongoing battle with the FTC over its attempted acquisition of Activision Blizzard — exposed the company’s plans for its next-generation Xbox, in addition to a slew of confidential company correspondence and information.

If there’s any takeaway, according to Wiz, it’s simply that handling the massive amounts of data required to train AI models demand high levels of care and security precautions, especially as companies rush new AI products to market.

More on consequential mistakes: Casinos Shut down amid Hacker Intrusions