"When you use something for free, you are the product."

Reddit and Weep

Underlying the storm of hype and funding in the AI sector right now is a scarce resource: data, created by old-fashioned humans, that's needed to train the huge models like ChatGPT and DALL-E that generate text and imagery.

That demand is causing all sorts of drama, from lawsuits by authors and news organizations that say their work was used by AI companies without their permission to the looming question of what happens when the internet fills up with AI-generated content and AI creators are forced to use that to train future AI.

And, of course, it's also fueling new business deals as AI developers rush to lock down repositories of human-generated work that they can use to train their AI systems. Look no further than this wild scoop from Bloomberg: that an undisclosed AI outfit has struck a deal to pay Reddit $60 million per year for access to its huge database of users' posts — perhaps the surest sign yet that user data is the key commodity in the AI gold rush.

Mod Squad

It's not the first time we've seen an AI company cough up for access to a cache of text material. Remember when Axel Springer, the owner of publications ranging from Politico to Business Insider, inked a deal with OpenAI to use its outlets' work in ChatGPT?

But in some respects, it does differ from that bargain. For one, journalists are paid for their work, even if they don't stand to benefit — and may actually be harmed — by its inclusion in AI systems. Redditors, though, have contributed their vast supply of words as a labor of love — which has to rankle when it's all vacuumed up for profit.

"Where the fuck is my cut?" quipped on Redditor in response to the news.

"When you use something for free, you are the product," another retorted.

Even stranger is that in spite of the appreciable sum changing hands — remember, this is $60 million every single year — we don't actually know who's paying for all this data.

And don't forget that Reddit's leadership has already been in users' crosshairs for what they see as profiteering and enshittification of the site in preparation for a lucrative public offering.

"As an AI language model, I cannot condone the selling of public forums' user data as training data without compensation for the users of said forum," another Redditor wrote of the AI deal, riffing on the way ChatGPT and other systems frequently demur from answering controversial questions.

More on AI: Amazon AGI Team Say Their AI Is Showing "Emergent Abilities"


Share This Article