This year, Facebook AI Research (FAIR) unveiled their neural net at the GPU Technology Conference. At the event, they showed off the tech’s ability to create virtual images based on written descriptions.
At the presentation, the team demonstrated the neural net by typing in the word “beach,” which the AI used to generate an image of painted beach underneath a sky of clouds. FAIR then typed the words “beach – clouds” which was quickly followed by a similar image except without the clouds this time. To top off the presentation they input the words “sunset beach – clouds” and the AI generated an image of a setting sun along a beach and clear skies.
The video below shows what this kind of technology can mean to the public, especially people with visual impairments.
As shown, the Facebook AI Research team has been able to train this neural net to associate specific words with their corresponding images. They did this by using a supercomputer to show the neural net thousands upon thousands of various images.
What makes Facebook’s neural net so special is its ability to take combinations of words and associate them with an appropriate corresponding image. The AI is able to both add to the description as well as remove certain aspects of an image to fulfill the text description. This is what’s called a natural language interface.
This trick works well with 2D images, but the company has greater ambitions that involve generating 3D images though the same natural language interface.
Graduating from 2D to 3D is no simple task, though. It involves enabling the neural net to recognize 3D spaces and then training it to identify a wealth of 3D assets, which aren’t as plentiful as 2D images. Fortunately, the recent proliferation of VR and AR opens to door to a world of different assets, making it a safe bet that in the near future, you’ll be able to create entire virtual worlds simply by saying so.