New Machine-Learning System for Automatically Producing Captions

Updated 8.13.15, 5:40 PM EDT by Alex Klokus

The new system merges computer vision and language models into a single jointly trained system to produce a human readable sequence of words
It then replaces the recurrent neural network and its input words with a deep Convolutional Neural Network (CNN) trained to classify objects in images
This could eventually help visually impaired people understand pictures and make it easier for everyone to search on Google for images

Share This Article