Researchers Are Trying to Use AI to Put an End to Hate Speech

Safeguarding Against Hate

In an ideal world, the best stopper for hate speech is an individual's good sense of decency and propriety — in other words, a deep and profound respect of the human person, regardless of differences in opinion, race, or gender. However, we don't live in an ideal world. As such, hate speech abounds, and the relatively free space social media offers us has given it a platform that's equally destructive — or perhaps even more so.

Social networking sites have attempted to control the problem, but to little or no avail. While you can report hate speech, it's just physically impossible to monitor every single offender, every stream of derogatory utterances posted in private conversations or public forums. Unless you're not human — which is what researchers are trying to explore by using artificial intelligence (AI) to finally crack down on the problem of hate speech.

Haji Mohammad Saleem and his colleagues from McGill University in Montreal, Canada, developed an AI that learns how members of hateful communities speak. This is a different tactic than attempted by Google parent company Alphabet's Jigsaw, focusing on certain key words or phrases resulting in a toxicity score. According to New Scientist, it didn't work. The comment "you're pretty smart for a girl" was marked 18% similar to what people considered toxic, while "i love Fuhrer" was marked 2% similar.

An AI Guard Dog

In a paper published online, Saleem and his team described how their AI works. Their machine learning algorithm was trained using data dumps of posts in the most active support and abuse groups in Reddit between 2006 and 2016, in addition to posts on other forums and websites. They focused on three groups that have often received abuse, online and otherwise: African Americans, people who are overweight, and women.

"We then propose an approach to detecting hateful speech that uses content produced by self-identifying hateful communities as training data," the researchers wrote. "Our approach bypasses the expensive annotation process often required to train keyword systems and performs well across several established platforms, making substantial improvements over current state-of-the-art approaches."

Click to View Full Infographic

Their algorithm caught subtext which could easily be lost when one relies on just keywords, and resulted in fewer false-positives than the keyword method. "Comparing hateful and non-hateful communities to find the language that distinguishes them is a clever solution," Cornell University professor Thomas Davidson told New Scientist. However, there are still limitations. The team's AI was trained on Reddit posts and it may not be as effective on other social media websites. Furthermore, it also missed some obviously offensive speech which keyword-based AI would catch. That's understandable, though. Stopping hate speech is as tricky as catching online terrorist propaganda.

Indeed, while AI may become better at catching online hate, it might not be able to do it alone. "Ultimately, hate speech is a subjective phenomenon that requires human judgment to identify," Davidson added. Human decency may be something no AI can replace.

Share This Article