Microsoft Is Teaching Computers to See Like People
The company's researchers are developing systems that process visual information like humans and can answer questions about a photo's content.Microsoft's quest to build computing systems that understand the world around them doesn't end with the company's Project Oxford machine-learning technology. Researchers at the Redmond, Wash., software maker are also developing systems that mimic how humans pull information from the things they see. "When a person is asked about something in a photo, they're taking in a lot of details—a lot of words—to answer questions about it," blogged Microsoft spokesperson Athima Chansanchai. "Now, a team of Microsoft researchers, together with colleagues from Carnegie Mellon University, has created a system that uses computer vision, deep learning and language understanding to analyze images and answer questions the same way humans would." Together, the researchers created a model that "applies multi-step reasoning to answer questions about pictures," said Chansanchai. The technology is being advanced by Li Deng, Xiaodong He and Jianfeng Gao from Microsoft Research's Deep Learning Technology Center, along with Carnegie Mellon University researchers Zichao Yang and Alex Smola. "The system takes in information a human set of eyes and brain would, looking at a scene's action (if there is any) and the relationships among multiple visual objects," said Chansanchai. "Though it may sound simple for humans, it's a lot for a computer to learn language and to find answers in an image. But using deep neural networks, it can."
Deng and his group are imbuing the system with the ability to pay attention, focus on visual cues and infer answers progressively to solve problems. It's an advancement in human behavior modeling that was not possible a few years ago, he said.