Microsoft AI Model Surpasses Humans in Reading Comprehension

A new AI model from Microsoft Research Asia beats humans in the SQuAD benchmark for machine reading comprehension.

artificial intelligence

Microsoft's researchers in China have crossed a significant artificial intelligence threshold, and their innovations may pave the way for smarter search engines and virtual assistants.

A new AI model submitted by Microsoft Research Asia now sits atop the SQuAD leaderboard with a score of 82.650 on the Exact Match portion of the benchmark. SQuAD, short for Stanford Question Answering Dataset, is a machine reading comprehension data set used to determine how well AI systems interpret information and answer questions based on that information. The dataset comprises more than 100,000 question-and-answer pairs pertaining to a set of over 500 Wikipedia articles.

For comparison's sake, an Exact Match score of 82.304 denotes a human's ability to answer the same questions. Chinese e-commerce giant Alibaba came in second, scoring 82.440. After factoring in both the Exact Match and looser F1 metrics used in the benchmark, Alibaba and Microsoft Research Asia are tied for first place on the SQuAD leaderboard.

Microsoft envisions AI systems that can quickly parse information contained in documents and books, providing users with relevant information when they need it, in a manner that is easy to understand. "That would let drivers more easily find the answer they need in a dense car manual, saving time and effort in tense or difficult situations," wrote Microsoft representative Allison Linn in a Jan. 15 announcement.

The software giant also sees a role for machine reading comprehension in the workplace, particularly in those where the stakes are high.

"These tools also could let doctors, lawyers and other experts more quickly get through the drudgery of things like reading through large documents for specific medical findings or rarified legal precedent," added Lin. "The technology would augment their work and leave them with more time to apply the knowledge to focus on treating patients or formulating legal opinions."

Microsoft isn't the only technology company that envisions a place for AI in law and medicine.

Ross Intelligence, a "digital attorney" startup from San Francisco, uses IBM Watson for its time-saving legal research tool. In 2016, IBM and Quest Diagnostics launched IBM Watson Genomics, a service that combines Watson's cognitive computing capabilities with Quest's genomic sequencing technologies, enabling doctors to quickly hone in on treatment options without poring through countless medical journals and clinical trial reports.

There's a good chance Microsoft's newest machine reading AI model, or at least some version of it, will make its way into commercial solutions. Linn noted that its predecessors are already being integrated into the company's Bing search engine, allowing visitors to obtain answers faster and with less typing.

Microsoft is also exploring ways to use the technology for services that will allow users to conduct complex searches that build upon the original question, similar to the Q&A feature in Power BI. For example, users will be able to ask a basic question about a public or historical figure, like the year they were born, and post follow-up questions based on that query.

Pedro Hernandez

Pedro Hernandez

Pedro Hernandez is a contributor to eWEEK and the IT Business Edge Network, the network for technology professionals. Previously, he served as a managing editor for the network of...