Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Latest News

      How OpenAI Says Its New Safety Hub is ‘Making AI Models More Secure’

      Written by

      eWEEK Staff
      Published May 19, 2025
      Share
      Facebook
      Twitter
      Linkedin
        An employee working on AI technology.
        Image: Source: Envato/DC_Studio

        eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

        As debates around the safety and ethics of artificial intelligence heat up, OpenAI is actively engaging the public through the launch of its Safety Evaluations Hub, designed to enhance transparency around how its AI models are assessed and secured.

        “As models become more capable and adaptable, older methods become outdated or ineffective at showing meaningful differences (something we call saturation),” the company said in a statement posted to the hub, “so we regularly update our evaluation methods to account for new modalities and emerging risks.”

        Preventing harmful interactions

        OpenAI’s Safety Evaluations Hub examines AI performance in refusing inappropriate or dangerous prompts, including hate speech and illegal activities. Using an automated evaluation system known as an autograder, the responses are assessed based on two primary metrics.

        Most of OpenAI’s models demonstrated high effectiveness, achieving scores close to perfect at 0.99 for declining harmful prompts, although GPT-4o-2024-08-16, GPT-4o-2024-05-13, and GPT-4-Turbo fell slightly below that mark. Interestingly, the models were less consistent in handling benign queries. The top performer in this area was OpenAI o3-mini, scoring 0.80, with other models achieving between 0.65 and 0.79.

        Resisting jailbreak attempts

        “Jailbreaking” refers to attempts by users to manipulate AI into producing restricted or unsafe content, bypassing safety protocols. To gauge resilience, OpenAI applied the StrongReject benchmark — focused on common automated jailbreak techniques — and also used human-generated jailbreak prompts. Models showed varying degrees of vulnerability, scoring between 0.23 and 0.85 against StrongReject, while performing considerably better, with scores from 0.90 to 1.00, against human-generated attacks. This indicates models are generally robust against manual exploits but remain susceptible to automated jailbreak attempts.

        Managing hallucination risks

        A critical challenge for current AI models involves “hallucinations,” or the production of inaccurate or nonsensical responses. OpenAI tested models using two benchmarks, SimpleQA and PersonQA, to assess accuracy and the frequency of hallucinations. For SimpleQA, accuracy scores ranged from 0.09 to 0.59, with hallucination rates from 0.41 to 0.86. In PersonQA evaluations, accuracy spanned from 0.17 to 0.70, and hallucination rates from 0.13 to 0.52.

        These outcomes highlight ongoing issues with reliably providing accurate responses, especially to straightforward queries.

        Balancing instruction priorities

        The hub also evaluates how models prioritize conflicting instructions, such as those between system, developer, and user-generated messages. Scores showed variability, with system-versus-user instruction conflicts achieving between 0.50 and 0.85, developer-versus-user conflicts scoring from 0.15 to 0.77, and system-versus-developer conflicts ranging from 0.55 to 0.93. This reflects a general respect for established hierarchy, notably system instructions, while inconsistencies persist in handling developer instructions relative to user directives.

        Driving improvements in AI safety

        Insights from the Safety Evaluations Hub directly influence how OpenAI refines current AI models and approaches future development. The initiative promotes more accountable and transparent AI advancements by pinpointing weaknesses and charting improvement. For users, this represents an unprecedented opportunity to view and understand the safety protocols behind the powerful AI technologies they increasingly interact with daily.

        This article relied on reporting by eWeek contributor J. R. Johnivan.

        eWEEK Staff
        eWEEK Staff

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        Get the Free Newsletter!

        Subscribe to Daily Tech Insider for top news, trends & analysis

        MOST POPULAR ARTICLES

        Artificial Intelligence

        9 Best AI 3D Generators You Need...

        Sam Rinko - June 25, 2024 0
        AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
        Read more
        Cloud

        RingCentral Expands Its Collaboration Platform

        Zeus Kerravala - November 22, 2023 0
        RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
        Read more
        Artificial Intelligence

        8 Best AI Data Analytics Software &...

        Aminu Abdullahi - January 18, 2024 0
        Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
        Read more
        Latest News

        Zeus Kerravala on Networking: Multicloud, 5G, and...

        James Maguire - December 16, 2022 0
        I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
        Read more
        Video

        Datadog President Amit Agarwal on Trends in...

        James Maguire - November 11, 2022 0
        I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
        Read more
        Logo

        eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

        Facebook
        Linkedin
        RSS
        Twitter
        Youtube

        Advertisers

        Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

        Advertise with Us

        Menu

        • About eWeek
        • Subscribe to our Newsletter
        • Latest News

        Our Brands

        • Privacy Policy
        • Terms
        • About
        • Contact
        • Advertise
        • Sitemap
        • California – Do Not Sell My Information

        Property of TechnologyAdvice.
        © 2024 TechnologyAdvice. All Rights Reserved

        Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

        ×