A popular and powerful text-to-video AI generator developed by Runway was trained on copious amounts of pirated content and ripped off YouTube videos, according to a gigantic internal spreadsheet obtained by 404 Media.

Last month, the company's Gen-3 Alpha video generation tool drew huge amounts of attention, with publications — including Futurism — lauding the almost photorealistic clips it could generate. At the time, Runway claimed that Gen-3 Alpha was "trained jointly on videos and images," but stopped far short of elaborating on the source of the data.

Now, according to the document obtained by 404 Media, there may be a good reason for that coyness. The spreadsheet is chock full of popular content drawn from major YouTube channels, including those belonging to Disney, Netflix, and Sony, in addition to links to websites that are known to host pirated content.

While 404 Media couldn't confirm that Gen-3 Alpha was trained on all of the listed assets, it seems circumstantially very likely — and, as such, a striking new piece of evidence that AI companies are shamelessly stealing content to feed AI models with a complete disregard for copyright — a consistently recurring pain point in the world of generative AI.

While questions remain as to which videos actually made it into the training data, 404 Media was effortlessly able to generate believable videos of well-known YouTube personalities.

Runway even reportedly went as far as to hide its tracks by using a proxy to avoid being blocked by YouTube.

"The channels in that spreadsheet were a company-wide effort to find good quality videos to build the model with," an unnamed former employee told 404 Media. "This was then used as input to a massive web crawler which downloaded all the videos from all those channels, using proxies to avoid getting blocked by Google."

Runway raised a whopping $141 million in funding last year, including from YouTube owner Google, Salesforce, and chipmaker NVIDIA — for a heady valuation of $1.5 billion.

And it's not just Runway that has come under fire for using copyrighted material without obtaining the necessary licenses to train its AI models. Earlier this year, OpenAI CTO Mira Murati claimed in an interview with the Wall Street Journal that she didn't know if training data for the company's upcoming Sora video generator included videos from YouTube, Instagram, or Facebook — a bizarre admission that drew plenty of skepticism.

A couple of weeks later, the New York Times revealed that OpenAI had ignored corporate policies to skirt copyright laws, relying on tools that transcribe YouTube videos to train its AI chatbots.

Meanwhile, YouTube CEO Neal Mohan warned AI companies that training AI models on YouTube videos would be a "clear violation" of the video platform's terms of use.

In other words, this latest report is yet more evidence that AI companies including Runway and OpenAI are playing fast and loose with copyrighted material.

The topic of intellectual property will likely remain a major sticking point in the development of generative AI, perhaps especially when it comes to AI models that can generate entire videos.

The tech is even forcing legislators to revisit "fair use," a doctrine that permits the limited use of copyrighted material under US law. While AI companies have previously argued that much of the scraped data is fair game in court, many copyright holders have cried foul, leading to a fierce and still growing legal battle.

And by linking its work to ripped-off and pirated videos, Runway has vaulted itself into the hot seat.

More on generative AI: Google Researchers Publish Paper About How AI Is Ruining the Internet


Share This Article