One of the biggest challenges for language-processing artificial intelligence is figuring out the underlying meaning of slang, colloquialisms, and intentional misspellings.
In order to help those hapless machines out, a team of mathematicians from the University of Vermont started to analyze how young people deliberately stretch words when they type. For instance, they’ve quantified the semantic difference between stretched words like “hahaha” and “haaahaha” in hopes that future AI algorithms can learn to understand us in the informal ways we actually communicate online.
In their research, published Wednesday in the journal PLOS One, the team analyzed the so-called “stretchable words” that appeared in 100 billion tweets posted over the past eight years. They then came up with two measurements: balance and stretch. For example, “lololol” has a high balance value whereas “nooooo” doesn’t because only one letter is repeated.
That could help algorithms — and future historians — understand that “dude” refers to a person while “duuuude” is synonymous with “yikes.”
Ultimately, the researchers argue that our dictionaries don’t reflect the way people actually communicate, and understanding the stretched words common on social media could fill an important knowledge gap.
“We were able to comprehensively collect and count stretched words like ‘gooooooaaaalll’ and ‘hahahaha’,” the researchers said in a press release, “and map them across the two dimensions of overall stretchiness and balance of stretch, while developing new tools that will also aid in their continued linguistic study, and in other areas, such as language processing, augmenting dictionaries, improving search engines, analyzing the construction of sequences, and more.”
READ MORE: Exploring the use of ‘stretchable’ words in social media [PLOS]
More on language processing: A Facebook AI Unexpectedly Created Its Own Unique Language