Judging Them Blind, Humans Appear to Prefer AI-Generated Poems

Suck it, Shakespeare.

Dead Poets

Scientists have found that readers have a lot of trouble telling apart AI-generated and human-written poetry — even works by the likes of William Shakespeare and Emily Dickinson.

Even more surprisingly, the researchers found that humans generally prefer the former over the latter, which could bode poorly for the role of human creativity in the age of generative AI.

As detailed in a new paper published in the journal Scientific Reports, University of Pittsburgh researchers Brian Porter and Edouard Machery conducted two experiments involving "non-expert poetry readers."

They found that "participants performed below chance levels in identifying AI-generated poems. Notably, participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems."

AI-generated poems got higher scores from participants in qualities including rhythm and beauty, something that appeared to lead them astray in picking out which poem was the product of a language model and which was the creative output of a human artist.

The team believes their difficulties may be due to the "simplicity of AI-generated poems" that "may be easier for non-experts to understand."

In simple terms, AI-generated poetry is appealingly straightforward, and less convoluted, for the palate of the average Joe.

Doing Lines

In their first experiment, participants were shown ten poems in a random order. Five were from renowned wordsmiths, including William Shakespeare, Emily Dickinson, and T.S. Eliot. The other five were generated by OpenAI's — already out-of-date — GPT 3.5 large language model, which was tasked to imitate the style of the aforementioned poets.

In a second experiment, participants were told to rate the poems based on 14 different characteristics including quality, emotion, rhythm, and — ironically, perhaps — originality. The participants were split into three groups who were then told that the poems were AI-generated, human-written, or given no information about their origin.

Interestingly, the group told that the poems were AI-generated tended to give the poems a lower score than those who were told that the poems were human-written.

And the third group, who received no information about the poems' origins, actually favored the AI-generated poems over the human-written ones.

"Contrary to what earlier studies reported, people now appear unable to reliably distinguish human-out-of-the-loop AI-generated poetry from human-authored poetry written by well-known poets," the two researchers concluded in their paper.

"In fact, the 'more human than human' phenomenon discovered in other domains of generative AI is also present in the domain of poetry: non-expert participants are more likely to judge an AI-generated poem to be human-authored than a poem that actually is human-authored," they wrote.

More on generative AI: The Wall Street Journal Is Testing AI-Generated Summaries of Its Articles

Share This Article