Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Subscribe
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Subscribe
    Home Latest News

      New AI Benchmark ARC-AGI-2 ‘Significantly Raises the Bar for AI’

      Written by

      J.R. Johnivan
      Published March 27, 2025
      Share
      Facebook
      Twitter
      Linkedin
        Technician in AI high tech workspace managing call center tasks

        eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

        The ARC Prize Foundation introduced this month the newest iteration of its popular benchmark: ARC-AGI-2 (also known as the Abstraction and Reasoning Corpus). The latest benchmark test is even more challenging than the original called ARC-AGI-1, which launched in 2019.

        According to Arc Prize Foundation President Greg Kamradt, “ARC-AGI-2 significantly raises the bar for AI.”

        Examining the early scores

        The ARC-AGI-2 benchmark is comprised of a series of puzzles for AI to solve. After giving the test to 400+ humans, the ARC Prize Foundation established a human baseline for its tests.

        • Human panel: 60% average with a cost per task of $17.

        Current generative AI tools, however, didn’t fare so well.

        • OpenAI o1-pro: 1% with a cost per task of $200.
        • OpenAI o3-mini-high: 0.0% with a cost per task of $0.41.
        • OpenAI GPT-4.5: 0.0% with a cost per task of $0.29.
        • DeepSeek-R1 and R1-Zero: 0.3% with a cost per task of $0.08.
        • Anthropic Claude 3.7: 0.0% with a cost per task of $0.120.
        • Google Gemini 2.0 Flash: 1.3% with a cost per task of $0.004.

        The human panel vastly outperformed the large language models (LLMs) and AI systems that were evaluated using ARC-AGI-2. But what datasets did the tests use?

        Analyzing the datasets

        The ARC-AGI-2 benchmark comprises a total of four datasets.

        • Training: 1,000 uncalibrated public tasks.
        • Public Eval: 120 calibrated public tasks.
        • Semi-Private Eval: 120 calibrated private tasks.
        • Private Eval: 120 calibrated private tasks.

        Tasks are considered calibrated when they are independent and identically distributed. This calibration approach, as detailed by TechCrunch, ensures that scores across these datasets remain directly comparable. 

        ARC Prize 2025: The grand prize is $700K 

        March 2025 also saw the announcement of ARC Prize 2025, which is based on the ARC-AGI-2 benchmarks and datasets. With a grand prize of $700,000, the competition challenges AI developers to attain an 85% accuracy rating on ARC-AGI-2’s private evaluation dataset.

        In order to be eligible, the types of AI models competing for the prize can spend no more than $0.42 per task. Moreover, they must complete the evaluation without access to the internet.

        Those competing in ARC Prize 2025 must have their submission ready by November 3, 2025. Researchers can also submit whitepapers for a shot at a $75,000 grand prize. The deadline for paper submissions is November 9, 2025.

        Whichever AI model scores the highest on the private evaluation dataset will be declared the winner. Paper submissions are evaluated according to a standardized rubric.

        J.R. Johnivan
        J.R. Johnivan
        J.R. Johnivan is a 17-year veteran whose writing is focused on innovation and technology, including IT, computer networking, security, cloud computing, staffing, human resources, real estate, sports, entertainment, and more.

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        MOST POPULAR ARTICLES

        Artificial Intelligence

        9 Best AI 3D Generators You Need...

        Sam Rinko - June 25, 2024 0
        AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
        Read more
        Cloud

        RingCentral Expands Its Collaboration Platform

        Zeus Kerravala - November 22, 2023 0
        RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
        Read more
        Artificial Intelligence

        8 Best AI Data Analytics Software &...

        Aminu Abdullahi - January 18, 2024 0
        Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
        Read more
        Latest News

        Zeus Kerravala on Networking: Multicloud, 5G, and...

        James Maguire - December 16, 2022 0
        I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
        Read more
        Video

        Datadog President Amit Agarwal on Trends in...

        James Maguire - November 11, 2022 0
        I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
        Read more
        Logo

        eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

        Facebook
        Linkedin
        RSS
        Twitter
        Youtube

        Advertisers

        Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

        Advertise with Us

        Menu

        • About eWeek
        • Subscribe to our Newsletter
        • Latest News

        Our Brands

        • Privacy Policy
        • Terms
        • About
        • Contact
        • Advertise
        • Sitemap
        • California – Do Not Sell My Information

        Property of TechnologyAdvice.
        © 2024 TechnologyAdvice. All Rights Reserved

        Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

        ×