Lawsuit Claims Google Is Vacuuming Up People’s Whole Lives to Train AI

Lawsuits against AI companies for their AI training and data practices continue to pile up. This time, Google's the one in the hot seat. — Close up of digital eye with layered graphics *Image: Getty Images*

Everything and Anything

Lawsuits against AI companies for their training and data practices continue to pile up. And this time, Google’s the one in the hot seat.

As Reuters reports, a class action lawsuit is accusing Google of “secretly stealing everything ever created and shared on the internet by hundreds of millions of Americans” in order to train its AI models.

Filed on Tuesday in San Francisco by the Clarkson Law Firm — which, notably, filed a very similar class action case against ChatGPT maker OpenAI just two weeks ago — the suit claims Google could owe at least five billion dollars in restitution for vacuuming up our online lives, from social media albums to blog posts to published novels, and using that data to train AI systems.

“Google has taken all our personal and professional information, our creative and copywritten works, our photographs, and even our emails — virtually the entirety of our digital footprint,” reads the lawsuit.

“For years, Google harvested this data in secret,” it continues, “without notice or consent from anyone.”

Terms of Service

In a statement to Reuters, the plaintiffs’ attorney, Ryan Clarkson, doubled down on the suit’s accusations.

“Google does not own the internet,” said Clarkson, “it does not own our creative works, it does not own our expressions of our personhood, pictures of our families and children, or anything else simply because we share it online.”

To Clarkson’s point, though, the search giant kind of does own the internet. With roughly 90 percent of all search market share, Google is one of the most — if not the most — prominent mediators of our online lives. While none of us may ever have checked a big red box that explicitly said “YES, everything I have ever shared to the internet can and should be used to train AI systems, which will eventually use my outputs to generate content,” most of us have signed massive chunks of our lives and privacy away to a number of platforms in order to use the web — even if those agreements were always in the fine print.

Google, for its part, seems confident that its training practices have steered clear of any wrongdoing, with Google general counsel Halimah DeLaine Prado telling Reuters that the search company has been “clear for years that we use data from public sources — like information published to the open web and public datasets — to train the AI models behind services like Google Translate, responsibly and in line with our AI Principles.”

“American law supports using public information to create new beneficial uses,” she added, “and we look forward to refuting these baseless claims.”

More on AI lawsuits: OpenAI Sued for Using Everybody’s Writing to Train AI