"One of our key sources of human data is no longer fully 'human!'"

Chasing the Tail

Amazon's Mechanical Turk platform, which launched in 2005, has allowed humans to make some money on the side by completing small tasks such as data validation or simple transcriptions that "require human intelligence."

The basic idea is to "break down a manual, time-consuming project into smaller, more manageable tasks to be completed by distributed workers over the Internet," in Amazon's own words. Unsurprisingly, the platform — named for a supposedly mechanical 18th century chess-playing machine that actually had a person inside, controlling it — has already been used to train AI systems as well.

Yet with the advent of powerful AI chatbots, the dynamics have started to shift profoundly. As TechCrunch reports, researchers at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland have found that a significant number of Mechanical Turk workers are already using large language models (LLMs) to automate their labor.

"One of our key sources of human data is no longer fully 'human!'" tweeted EFPL PhD candidate Manoel Ribeiro, co-author of a yet-to-be-peer reviewed paper on the research. "We estimate that 33-46 percent of crowd workers on MTurk used LLMs in a text production task — which may increase as ChatGPT and the like become more popular and powerful."

In other words, humans are using AI systems to automate menial tasks that were originally deemed to be too complex for machines. In some instances, that could mean humans are using AI models to train AIs — the latest example of a bizarre hall of mirrors in which AI data is used to train AIs, likely leading to misinformation and chaos.

High-Speed Turkforce

On the one hand, as TechCrunch notes, this isn't terribly surprising. AI systems have advanced rapidly over the past few years, with powerful tools like OpenAI's ChatGPT and Google's Bard increasingly blurring the lines between humans and machines.

And Amazon's Mechanical Turk workers are paid — often meagerly — per task. If an LLM-powered tool can expedite those tasks, it's hard to blame them for going the automation route, especially considering there isn't much oversight from Amazon, according to TechCrunch.

As the researchers note, there are some reasons to be concerned as it becomes more difficult to distinguish between human and AI-generated data.

"LLMs are becoming more popular by the day, and multimodal models, supporting not only text, but also image and video input and output, are on the rise," the researchers conclude in their paper. "With this, our results should be considered the 'canary in the coal mine' that should remind platforms, researchers, and crowd workers to find new ways to ensure that human data remain human."

More on AI: Owner of Gaming Sites Fires Writers, Hires for "AI Editor" To Churn out Hundreds of Articles per Week


Share This Article