If you’ve ever tried to have a conversation with a chatbot, you know that even today’s state-of-the-art systems aren’t exactly eloquent, regularly doling out nonsensical or painfully generic responses.
Now, though, Google has created Meena, a chatbot it says is better than any other it’s tested — a claim the company supports using a new metric it developed specifically to measure an AI’s conversational abilities.
After creating Meena, a process detailed in a paper published on the preprint server arXiv, Google needed a way to evaluate the chatbot. To that end, it developed something it calls the Sensibleness and Specificity Average (SSA).
To compute this metric, Google asked human workers to conduct about 100 free-form conversations with Meena and several other open-domain chatbots. Each time the chatbot responded, the worker had to answer two questions about the response.
First, did it make logical and contextual sense within the conversation? If yes, they then had to answer the question, “Was it specific to the conversation?” This was to weed out any generic responses — for example, if the human wrote that they liked tennis, and the chatbot responded, “That’s nice,” the response would be tagged as “not specific.”
Google determined that an average human would achieve an SSA score of 86 percent.
The other chatbots in the team’s study scored between 31 percent and 56 percent. Meena, however, scored a 79 percent — putting the AI closer to the level of conversation expected from a human than another chatbot.
READ MORE: Meena is Google’s attempt at making true conversational AI [VentureBeat]
More on chatbots: Taylor Swift Reportedly Threatened Microsoft Over Racist Chatbot