Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Latest News

      FrontierMath Benchmark Exposes AI Struggles in Advanced Math

      Written by

      Esther Shein
      Published November 30, 2024
      Share
      Facebook
      Twitter
      Linkedin
        Mathematical implements on a wooden table with scribbled equations on a greenboard.

        eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

        Artificial intelligence is proving its value for generating text, recognizing images, and automating processes, but AI systems are hitting walls when trying to solve advanced math reasoning challenges. A trailblazing new benchmark from research firm Epoch AI called FrontierMath found that even today’s most advanced AI systems, including GPT-4o and Gemini 1.5 Pro, solved less than 2 percent of the math reasoning challenges they faced—even after long hours of work.

        Benchmarks are needed to understand and measure AI’s progress. According to Epoch AI’s product marketing, FrontierMath “can assess how well AI systems engage in complex scientific reasoning” because “mathematical problems can be rigorously and automatically verified,” unlike areas in which subjective judgment and expensive tests are used for evaluation.

        How the Models Performed

        Epoch AI provides sample problems that expert mathematicians spend hours solving—for example, testing Artin’s primitive root conjecture or finding the degree 19 polynomial. Current AI models were provided with “extensive support to maximize their performance” before undertaking advanced mathematical problems, including access to Python environments for testing and verification. However, that support wasn’t enough to prepare them.

        “FrontierMath has proven exceptionally challenging for today’s AI systems,” Epoch AI reported.

        The AI systems scored high on easier math benchmarks like GSM8K and MATH—above 90 percent—but scored around 2 percent on the advanced problems. All FrontierMath problems are previously unpublished to eliminate the data contamination concerns of existing benchmarks.

        In a blog post on the new benchmark, mathematician Evan Chen said he believes FrontierMath differs from traditional math competitions like the International Mathematical Olympiad (IMO) or Putnam in a few ways. IMO problems avoid specialized knowledge and complex calculations, while FrontierMath welcomes them. While they all test for creative insight, he said, FrontierMath “outright invert(s)” two other properties for setting a problem: it should not take a lot of implementation, and it should be elementary.

        “Because an AI system has vastly greater computational power,” Chen wrote, “it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, ‘write a proof’ is replaced by ‘implement an algorithm in code.”

        Evaluating AI Systems: What’s Next

        To see if AI systems possess research-level mathematical reasoning capabilities during evaluation, Epoch AI said it will take the following steps to make the benchmark more valuable as AI systems advance:

        •  Regular evaluations of leading AI models
        • Expanding the benchmark
        • Releasing additional problems to the public
        • Strengthening quality control

        Epoch AI said the FrontierMath benchmark was developed in collaboration with over 60 mathematicians from leading institutions. It spans the full spectrum of modern mathematics from computational number theory to abstract algebraic geometry.

        Esther Shein
        Esther Shein
        Esther Shein is a longtime content writer specializing in tech and business. Her work has appeared in several local and national publications. She writes news, features, case studies, custom content and marketing materials.

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        MOST POPULAR ARTICLES

        Artificial Intelligence

        9 Best AI 3D Generators You Need...

        Sam Rinko - June 25, 2024 0
        AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
        Read more
        Cloud

        RingCentral Expands Its Collaboration Platform

        Zeus Kerravala - November 22, 2023 0
        RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
        Read more
        Artificial Intelligence

        8 Best AI Data Analytics Software &...

        Aminu Abdullahi - January 18, 2024 0
        Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
        Read more
        Latest News

        Zeus Kerravala on Networking: Multicloud, 5G, and...

        James Maguire - December 16, 2022 0
        I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
        Read more
        Video

        Datadog President Amit Agarwal on Trends in...

        James Maguire - November 11, 2022 0
        I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
        Read more
        Logo

        eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

        Facebook
        Linkedin
        RSS
        Twitter
        Youtube

        Advertisers

        Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

        Advertise with Us

        Menu

        • About eWeek
        • Subscribe to our Newsletter
        • Latest News

        Our Brands

        • Privacy Policy
        • Terms
        • About
        • Contact
        • Advertise
        • Sitemap
        • California – Do Not Sell My Information

        Property of TechnologyAdvice.
        © 2024 TechnologyAdvice. All Rights Reserved

        Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

        ×