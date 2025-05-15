eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

OpenAI has rolled out its new GPT-4.1 and GPT-4.1 mini models into ChatGPT, expanding access beyond its API. The update, announced Wednesday, now allows users to access these models directly through ChatGPT’s “more models” dropdown.

OpenAI says the move comes in response to user requests.



GPT-4.1 is available to ChatGPT Plus, Pro, and Team subscribers, while Enterprise and Education users are expected to gain access “in the coming weeks.” Meanwhile, GPT-4.1 mini is replacing GPT-4o mini as the default model for all ChatGPT users, including those on the free plan.

A model built for developers

GPT-4.1 is designed to excel in coding and instruction-following tasks.

“We built it for developers, so it’s very good at coding and instruction following—give it a try!” said Kevin Weil, OpenAI’s chief product officer, in an X post.

The new model offers improved performance on software engineering benchmarks and shows stronger results when following detailed instructions. In internal testing, GPT-4.1 delivered a 21.4-point improvement over GPT-4o on the SWE-bench Verified benchmark for software tasks.

For developers who rely on ChatGPT to write, debug, or review code, this jump in performance means faster results, fewer errors, and less time spent reworking AI-generated suggestions, potentially saving time and reducing bugs in production code.

Safety and transparency

OpenAI’s earlier release of GPT-4.1 drew criticism from parts of the AI community for not publishing safety details immediately. But with this new launch, OpenAI is stepping up its transparency efforts.

The company introduced a new Safety Evaluations Hub, where users can see performance metrics for its models. GPT-4.1 scored:

0.40 on SimpleQA , indicating moderate success in answering straightforward factual questions correctly.

, indicating moderate success in answering straightforward factual questions correctly. 0.63 on PersonQA. This test measures how well the model answers questions about people, such as public or historical figures.

This test measures how well the model answers questions about people, such as public or historical figures. 0.99 on “not unsafe” prompts in refusal tests, showing the model is highly reliable at rejecting potentially harmful or unsafe requests.

in refusal tests, showing the model is highly reliable at rejecting potentially harmful or unsafe requests. 0.96 in human-sourced jailbreak tests. It performed well when tested by real people trying to get around safety rules.

It performed well when tested by real people trying to get around safety rules. 0.23 in the StrongReject academic jailbreak test. This lower score shows it was less effective at resisting advanced, research-level attempts to bypass safety systems compared to models like o3 and GPT-4o mini.

GPT-4.1 in ChatGPT currently supports context windows up to 128,000 tokens for Pro users, 32,000 for Plus users, and 8,000 for free users. OpenAI has hinted that larger context support may come to ChatGPT in the future.

This release also comes amid reports that OpenAI is backtracking on its bid to become a for-profit entity. This follows pushback from its former co-founder Elon Musk and a coalition of former OpenAI employees, top academics, and Nobel laureates.