Microsoft’s AI Image Generator Construct Life-Like Pictures “Pixel By Pixel”
A step beyond Google's doodling AI, this new technology has applications for sketch assistants and photo editing tools.
Artificial intelligence (AI) has become increasingly more advanced in areas like humanoid vocal inflections and even building itself, but the art of drawing has long stymied computers. While Google may have taught its AI to doodle using its SketchRNN program, asking a computer to draw something specific, and more complicated, is a far bigger challenge. But now, researchers at Microsoft have programmed an AI image generator that can create pictures based on text descriptions.
This is an impressive accomplishment, as asking a computer to produce an image is far more challenging than asking it to find an image that matches a certain description on the internet. Microsoft’s AI image generator, called Attentional Generative Adversarial Network (AttnGAN), was trained with image and caption pairs with the goal of teaching it which words are associated with which images.
In their paper, the researchers wrote that their AI is able to automatically select the condition at the word level for generating different parts of the image, an unprecedented feat. For example, they inputed “this bird has a green crown black primaries and a white belly,” and the computer constructed the associated image with surprising accuracy.
Principal researcher Xiaodong He said in a Microsoft press release, “If you go to Bing and you search for a bird, you get a bird picture. But here, the pictures are created by the computer, pixel by pixel, from scratch.”
This technology is a three-fold improvement from previous image generators, the researchers say, and could improve photographers’ image editing process or serve as sketch assistant for artists. In the press release, He said he imagines animated movies generated from a written script could be another application.
It’s clear that tech companies like Google and Microsoft will continue to jockey for AI supremacy by pushing the limits of machine learning. But in this case, Microsoft’s multicolored bird image seems more impressive than Google’s cat doodles.