Microsoft Releases MARCO Data Set to Develop More Insightful AI Apps

Microsoft releases a data set, dubbed MS MARCO, that the company hope will enable artificial intelligence systems to answer questions more like a human would.

MARCO AI Data Set 2

Capping an AI-themed week at Microsoft, the company released MS MARCO, a data set of 100,000 questions and answers that researchers can use to train their artificial intelligence (AI) systems.

The software maker's own researchers based MS MARCO, short for Microsoft Machine Reading Comprehension, on anonymized data gleaned from the real-world queries posed to the company's Bing search engine.

The aim is to help get beyond today's AIs, which can answer simple fact-based answers that digital assistants like Cortana and Siri can, to intelligent systems that are attuned to the nuances of how people pose questions in a conversational context and the answers they expect in return.

For example, virtual assistants today are adept at solving math problems and reciting facts and figures like the number of days leading up to major holidays or the height of major landmarks, like the Empire State Building in New York City (1,250 feet or 1,454 feet at the top of its spire).

When they encounter more complex or somewhat ambiguous questions, they tend to default to search results that users must comb through to arrive at a desired answer

Microsoft hopes the data set can be used as a stepping stone to create virtual assistants that provide definitive answers from the start. Instead of generating a list of search results when faced with tough questions, they may someday analyze the web pages contained in that list to zero in on the exactly the answer the user was seeking.

The questions contained in MS MARCO are based on Bing search queries that Microsoft researchers and other staffers found interesting. Answers, based on existing web pages, were provided by humans and verified to be accurate. The data set is available to researchers for non-commercial purposes at no cost.

All told, it's been a busy week for Microsoft on the AI front.

On Monday, Microsoft Ventures kicked off a new fund specifically for AI startups. The first recipient is Element AI, a Montreal-based research lab and incubator that works with organizations and researchers to help develop commercial-grade AI systems.

A day later, the company teased the 2017 arrival of a competitor to Amazon Echo and Google Home.

Harman Kardon is gearing up for the release of a Cortana-powered speaker sometime next year, Microsoft revealed on Dec. 13. Using the just-announced Cortana Devices SDK, which allows electronics manufacturers to integrate Microsoft's AI-enabled digital assistant into their products, the as-yet-unnamed device will play music, create reminders and complete other tasks it picks up voice commands.

Beyond smart speakers, Harman (Harman Kardon's parent company) is exploring AI that can work with health care applications.

During the IBM World of Watson event in Las Vegas earlier this month, Harman announced a prototype system based on the company's internet of things (IoT) technologies, voice-activated JBL (another Harman brand) speakers and Watson, IBM's cognitive-computing platform.

The system, which was tested at the Thomson Jefferson University Hospital in Philadelphia, can be used by patients to control lights, adjust a room's temperature and lower blinds. Patients can also ask the system questions and set reminders that can help speed the recovery process.

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the Internet.com network of...