Microsoft Is Teaching Computers to See Like People | eWeek

Microsoft Is Teaching Computers to See Like People

computer vision
Nov 28, 2015
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Microsoft’s quest to build computing systems that understand the world around them doesn’t end with the company’s Project Oxford machine-learning technology. Researchers at the Redmond, Wash., software maker are also developing systems that mimic how humans pull information from the things they see.

“When a person is asked about something in a photo, they’re taking in a lot of details—a lot of words—to answer questions about it,” blogged Microsoft spokesperson Athima Chansanchai. “Now, a team of Microsoft researchers, together with colleagues from Carnegie Mellon University, has created a system that uses computer vision, deep learning and language understanding to analyze images and answer questions the same way humans would.”

Together, the researchers created a model that “applies multi-step reasoning to answer questions about pictures,” said Chansanchai. The technology is being advanced by Li Deng, Xiaodong He and Jianfeng Gao from Microsoft Research’s Deep Learning Technology Center, along with Carnegie Mellon University researchers Zichao Yang and Alex Smola.

“The system takes in information a human set of eyes and brain would, looking at a scene’s action (if there is any) and the relationships among multiple visual objects,” said Chansanchai. “Though it may sound simple for humans, it’s a lot for a computer to learn language and to find answers in an image. But using deep neural networks, it can.”

Deng and his group are imbuing the system with the ability to pay attention, focus on visual cues and infer answers progressively to solve problems. It’s an advancement in human behavior modeling that was not possible a few years ago, he said.

Microsoft envisions that the work will lead to systems that can anticipate human needs and provide real-time recommendations. Systems that can answer questions based on visual information are also key to developing artificial intelligence tools, according to the company.

For example, the technology can potentially lead to improved bike safety.

“The system could power all kinds of applications, such as a warning system for bicyclists. With a mounted camera continuously taking in the environment around the cyclist,” said Chansanchai.

The image analysis system builds on Microsoft’s prior work on technologies that can automatically caption photos. “The researchers say that was an important step in getting to this point because descriptions of scenes, annotated by people, provide meaning to a picture. That helps train the computer to understand the image the way a person would.”

Microsoft is increasingly banking on machine-learning systems as a way to help developers build a new generation of intelligent apps. Last month, the company announced the public beta of the Project Oxford Language Understanding Intelligent Service (LUIS), enabling coders to create applications that understand spoken instructions and search queries, similar to Microsoft’s own virtual assistant, Cortana. Project Oxford is a collection of machine-learning application programming interfaces (APIs) that also includes face and emotion detection, speech recognition and computer vision.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.