OpenAI Unleashes FrontierScience for AI-Fueled Scientific Reasoning | eWEEK

OpenAI Unleashes FrontierScience for AI-Fueled Scientific Reasoning

Scientist

Image: Unsplash

Verfasst von
eWEEK Staff
eWEEK Staff
Dec 17, 2025
3 minute read
eWeek Inhalte und Produktempfehlungen sind redaktionell unabhängig. Wir können Geld verdienen, wenn Sie auf Links zu unseren Partnern klicken. Mehr erfahren

Einstein a go-go. OpenAI is in the news again with something to help the scientific world.

The perennially busy firm has unveiled FrontierScience—an evaluation system that pushes artificial intelligence into uncharted territory by tackling the same complex scientific problems that typically challenge PhD researchers for weeks.

While previous benchmarks focused on recall, FrontierScience is advertised as genuine scientific reasoning across physics, chemistry, and biology. The system launched on Dec. 16 with over 700 carefully crafted questions designed by some of the world’s brightest scientific minds.

OpenAI’s GPT-5.2 model achieved a 77% success rate on olympiad-level problems—tasks that challenge the most gifted young scientists globally.

The performance gap

GPT-5.2 dominated structured olympiad-style questions with its 77% score, but then crashed dramatically when faced with open-ended research tasks, managing only 25% success.

This 52-point performance gap reveals what scientists are calling the “ambiguity barrier.” The benchmark’s creators assembled an unprecedented team—42 international olympiad medalists representing 109 medals, alongside 45 PhD scientists—to craft questions that would truly test AI reasoning. Some research questions are so complex that human experts estimate they would require several days of computer simulations or weeks of mathematical work to solve properly.

Consider this: when asked about “meso-nitrogen atoms in nickel(II) phthalocyanine,” researchers noted that running the computer simulations alone “could take several days”. Another question requesting derivation of “electrostatic wave modes” in plasma prompted one expert to admit: “I did a similar analysis earlier this year for a different kind of wave… I think it took about three weeks to do the maths correctly”.

New age of scientific discovery

The implications stretch far beyond impressive test scores—this benchmark signals we’re approaching a critical tipping point where AI transforms from sophisticated search engine to genuine research collaborator. When models eventually reach near-perfect scores on the research track, they’ll function as “very good collaborators” that can multiply the progress that PhD students or scientists can do.

The benchmark’s evaluation system represents a fundamental shift in AI assessment. Using 10-point rubrics graded by GPT-5 to assess reasoning quality, not just final answers, it moves beyond traditional “pass the test” mentalities to “can it do the job” evaluations. The progression is good—when similar PhD-level science benchmarks launched in November 2023, GPT-4 scored just 39%, but GPT-5.2 now hits 92% on those same questions.

This rapid advancement suggests we’re witnessing the emergence of AI systems that can genuinely contribute to scientific breakthroughs.

Advertisement

The race to scientific AI supremacy

This benchmark launch coincides with an unprecedented surge in AI research investment that’s reshaping the entire scientific landscape. The competition extends far beyond OpenAI—Google DeepMind’s AlphaFold has already predicted over 200 million protein structures, work that would have taken hundreds of millions of years to complete experimentally.

When AI models eventually close the 52-point gap between Olympiad and Research scores, they’ll handle ambiguous problems as easily as constrained ones. The speed of improvement suggests this limitation won’t persist long.

There was no reasoning with her. Hannah Wong, the executive who steered OpenAI through its most chaotic period, has announced she’s leaving the company.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Eigentum von TechnologyAdvice. © 2026 TechnologyAdvice. Alle Rechte vorbehalten

Werbetreibenden-Offenlegung: Einige der auf dieser Website erscheinenden Produkte stammen von Unternehmen, von denen TechnologyAdvice eine Vergütung erhält. Diese Vergütung kann beeinflussen, wie und wo Produkte auf dieser Website erscheinen, einschließlich beispielsweise der Reihenfolge, in der sie erscheinen. TechnologyAdvice schließt nicht alle Unternehmen oder alle auf dem Marktplatz verfügbaren Produkttypen ein.