"The Reddit corpus of data is really valuable. But we don’t need to give all of that value to some of the largest companies in the world for free."
Reddit has announced that it will start charging a premium for AI companies that scrape its data to train their models, the New York Times reports.
In a blog post, the company revealed that it's "introducing a new premium access point for third parties who require additional capabilities, higher usage limits, and broader usage rights."
The move highlights the fact that tech companies, which have accumulated huge troves of data on the internet for many years, are increasingly becoming wary of firms using those caches to train AI algorithms without asking permission.
"The Reddit corpus of data is really valuable," Steve Huffman, cofounder and CEO of Reddit, told the NYT. "But we don’t need to give all of that value to some of the largest companies in the world for free."
As the newspaper points out, it's the first significant time a social network has announced it will charge for having its content scraped by the likes of OpenAI.
The company has yet to announce how much money it will charge others for API access, but it did say app developers will still get free access.
While Twitter CEO Elon Musk hasn't cited LLMs as the reason, Twitter has similarly announced it will begin charging for access to Twitter's API.
Large language models like OpenAI's GPT-4 and Google's Bard have made extensive use of Reddit's data for training.
It's a significant step that could trigger a fundamental shift for search engines like Google. Reddit has long been a treasure trove of relevant data on the internet.
"More than any other place on the internet, Reddit is a home for authentic conversation," Huffman told the NYT. "There’s a lot of stuff on the site that you’d only ever say in therapy, or [Alcoholics Anonymous], or never at all."
"It’s a good time for us to tighten things up," he added. "We think that's fair."
More on AI chatbots: Major News Site Warns ChatGPT Is Inaccurate, Announces Plans to Use It Anyway