Microsoft’s new medical AI, the MAI-DxO system, has outperformed physicians on some of the toughest diagnostic challenges, achieving 85.5% accuracy, over four times higher than that of human doctors.
In a study from the company’s AI unit, Microsoft claims that “AI doesn’t face the trade-off,” noting that the system combines depth with breadth to outperform individual physicians across multiple aspects of diagnostic reasoning. The MAI-DxO model also reduced diagnostic costs compared to doctors, expanding the definition of medical expertise beyond the human mind.
Microsoft’s AI correctly diagnosed what doctors missed
Led by Mustafa Suleyman, Microsoft’s AI unit set out to test whether its new system could handle diagnoses that challenge even experienced doctors. The team ran MAI-DxO through 304 real-world cases published between 2017 and 2025 in the New England Journal of Medicine (NEJM).
The system was also tested against AI models from OpenAI, Google, Anthropic, Meta, and xAI. OpenAI’s o3 performed best among them, reaching 78.6% accuracy. But MAI-DxO topped the list at 85.5%, the highest across all models.
To compare it with human clinicians, Microsoft brought in 21 practicing physicians from the US and the UK. Each worked through a set of cases using the same step-by-step setup, without tools, references, or peer input. On average, they completed 36 cases each, with an accuracy of just 19.9%.
Saving more than just lives
In addition to delivering better results, the AI tool also cuts spending. In one setup, MAI-DxO reduced the average diagnostic spend to $2,396 per case, a 20% decrease compared to the $2,963 average cost of physicians.
The system reached similar or better outcomes while ordering fewer tests, showing that higher accuracy doesn’t have to come with a higher bill.
Clinical barriers remain
Despite MAI-DxO’s strong performance in testing, Microsoft notes the system hasn’t been evaluated in real-world clinical settings. The benchmark cases were rare and complex, and the physicians operated without access to references or digital tools, conditions that don’t reflect typical medical practice. As with many of the best AI healthcare solutions, clinical deployment remains a challenge.
Cost estimates were also limited. They relied on US hospital pricing and didn’t account for regional differences, follow-up care, or patient factors like time, access, and comfort. Microsoft maintains the system isn’t ready for clinical use and will require further validation in practice.
Even so, the AI company sees a broader shift. Physicians must often choose between a generalist range and a specialist depth. As Microsoft puts it, “AI doesn’t face the trade-off,” as the system blends both types of expertise, doing what no single physician is trained to do.
Read eWeek’s coverage of this year’s AI Job Satisfaction and Wellbeing Study, which explores how automation is reshaping workplace morale and productivity.