How computers may see you soon - (Image by Tejas Kulkarni/MIT)

Software that analyzes images is notoriously more difficult to program than data-analysis systems, which have the luxury of looking at strictly numeric information. A picture of a face can be rendered in some arbitrary binary form easily enough, but for a computer to look at that face and then recognize it under different lighting conditions—or from a slightly different angle than the original, which requires our brains to perform almost instantly and without conscious effort—is tremendously difficult.

As anyone who has used Tineye or Google’s reverse image search knows, there have been advances in this field recently, but they aren't without significant limitations. But a new class of languages falling under the canopy known as probabilistic programming aims to change all that by teaching machines how to make the same inferences that our brains magically do without prompting. A version of one of these languages, called Picture, has been found by researchers at MIT to simplify image analysis down from conventional programs that typically require thousands of lines of code, into a compact program with less than 50 lines.

At the heart of this improvement is an inference algorithm which dynamically calculates probabilities on the basis of observed data. A variety of those algorithms are brought to bear on an image—of, say, a snapshot of a human face—and can then not only infer the three-dimensional shape of the object depicted, but how that image might appear in different lighting conditions or from a different angle. There are many possibilities, but the algorithms make use of machine learning, modifying themselves on the fly as new data comes in—to optimize the results for any given scenario.

The method is based on inverse graphics. Computers render three-dimensional graphics by performing a series of deterministic calculations based on knowing the three-dimensional shape of the object being rendered, the angle at which it is to be depicted, and the angle of the light source being used to virtually illuminate it.

Inverse graphics reverse this process, breaking down the possible object shapes from a set of possible lighting values and visual angles. There are many possibilities—a dinner plate and a Frisbee can reflect light similarly—but the business of the inference algorithms is to determine which model better fits the image being observed.

"Picture's" results have been promising; The error rate exhibited by the analysis program looking at human poses comes in somewhere between 50 and 80 percent lower than earlier efforts. If Picture continues to provide solid results, the applications may go well beyond image classification work for search engines. Drones and other semi-autonomous robots could all benefit from faster, more accurate, and more compact vision subroutines.

The team from MIT plans to present their results at the Computer Vision and Pattern Recognition Conference in Boston early next month.


Share This Article