OpenAI’s o1 Shows Promise in Diagnosing Real Emergency Room Cases

OpenAI’s o1 Shows Promise in Diagnosing Real Emergency Room Cases

Female scientist and AI robot working together in the science lab.

Source: stockasso/Envato

Verfasst von
Liz Ticong
Liz Ticong
May 4, 2026
3 minute read
eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

Emergency rooms gave OpenAI’s o1 a much harder diagnostic test than a medical exam question.

In a blinded study using patient records from actual hospital visits, the AI system diagnosed emergency-room cases more accurately than two attending physicians. The findings, published in Science by researchers at Harvard Medical School and Beth Israel Deaconess Medical Center, suggest medical AI testing is starting to move closer to the messy conditions of real care.

For hospitals, researchers, and regulators, the study raises a bigger question: how much should diagnostic accuracy count when safety, testing decisions, and clinician judgment still shape what happens to a patient?

The result researchers didn’t expect

Adam Rodman, one of the study’s senior authors and a Harvard Medical School assistant professor of medicine at Beth Israel Deaconess, went into the emergency-room experiment with low expectations.

“I thought it was going to be a fun experiment but that it wouldn’t work that well. That was not at all what happened,” Rodman said.

The ER experiment included 76 cases from Beth Israel Deaconess Medical Center and compared OpenAI’s o1 with GPT-4o and a physician baseline. The biggest gap appeared at the earliest point in care. At initial triage, when patient information was most limited, o1 identified the exact or a very close diagnosis in 67.1% of cases. The physician scores were 55.3% and 50%.

The system also stayed ahead later in the ER process, including after the physician encounter and at admission to the medical floor or ICU.

Real records, real uncertainty

The ER portion of the study left polished medical case studies behind and tested the model closer to the way diagnosis unfolds in a hospital.

“To better understand real-world performance, we needed to test performance early in the patient course, when clinical data is sparse,” said co-first author Thomas Buckley.

The emergency department cases were drawn directly from electronic health records, and Harvard said the team did not preprocess the data before feeding it into the model. The AI had to work through incomplete, unstructured patient information at the same early moments when clinicians are still trying to figure out what is happening.

Advertisement

Not a green light for AI doctors

The researchers are not arguing that AI should diagnose patients on its own. The Science paper says the findings make the case for prospective clinical trials, along with more study of how clinicians and AI systems should work together in real care settings.

Diagnostic accuracy is only one piece of emergency care. Doctors still have to weigh risks, order tests, decide whether a patient should be admitted, and act on details that may never appear cleanly in a record.

“A model might get the top diagnosis right but also suggest unnecessary testing that could expose a patient to harm,” said Peter Brodeur, co-first author of the study and a Harvard Medical School clinical fellow in medicine at Beth Israel Deaconess. “Humans should be the ultimate baseline when it comes to evaluating performance and safety.”

The next test will be whether systems like o1 can improve care when doctors are using them, not just whether they can beat doctors on a diagnostic scorecard.

Researchers found that AI could detect many pre-diagnosis pancreatic cancer cases well before doctors could.

Liz Ticong

Liz Ticong is a tech industry expert with hands-on experience in AI, software testing, and product analysis. Specializing in AI news, software reviews, and buyer’s guides, she rigorously tests and experiments with the latest AI and tech tools to provide in-depth, practical insights. As a contributor to eWeek and TechRepublic, she simplifies complex topics, helping readers make well-informed decisions.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Eigentum von TechnologyAdvice. © 2026 TechnologyAdvice. Alle Rechte vorbehalten

Werbetreibenden-Offenlegung: Einige der auf dieser Website erscheinenden Produkte stammen von Unternehmen, von denen TechnologyAdvice eine Vergütung erhält. Diese Vergütung kann beeinflussen, wie und wo Produkte auf dieser Website erscheinen, einschließlich beispielsweise der Reihenfolge, in der sie erscheinen. TechnologyAdvice schließt nicht alle Unternehmen oder alle auf dem Marktplatz verfügbaren Produkttypen ein.