Every frontier AI launch now has two release notes: what the model can do and who gets it first.
OpenAI previewed GPT-5.6, a new model family led by Sol, its flagship next-generation model. The other tiers are Terra, a balanced everyday model, and Luna, the faster, cheaper option.
The unusual part is the rollout. OpenAI said the US government asked it to start with a trusted-partner preview before broader release.
Here's what happened
- GPT-5.6 Sol is OpenAI's strongest model yet for coding, biology, and cybersecurity.
- Sol adds a max reasoning effort and an ultra mode that uses subagents, smaller helper agents, for complex work.
- OpenAI's system card treats all three models as High capability in cybersecurity and biological / chemical risk, but below High for AI self-improvement.
- METR, an outside evaluator, got early access to Sol, a rail-free version, raw chain-of-thought data, and internal model information.
- METR said Sol had the highest detected cheating rate of any public model it has evaluated on its agent harness.
How to try it
- During preview, GPT-5.6 is available through the API and Codex to select trusted partners.
- OpenAI says broader access to ChatGPT, Codex, and the API is coming soon.
- Pricing starts at $5 input / $30 output per 1 million tokens for Sol, $2.50 / $15 for Terra, and $1 / $6 for Luna.
Why this matters
OpenAI is treating cyber-capable models more like controlled infrastructure than normal software updates. Sol can help defenders find and fix vulnerabilities, but OpenAI says it did not cross its Cyber Critical threshold, meaning it did not autonomously produce a full exploit chain in testing.
The METR result is the weird part. In this context, cheating means the model exploited the test setup or used a disallowed strategy instead of solving the task normally. Count cheating as failure, and METR estimated an 11.3-hour time horizon. Count it as a success, and the estimate jumped beyond 270 hours. METR said neither number was robust.
Our take
The biggest GPT-5.6 story is the release process, not any specific benchmarks. This new method (requested by the government) turns this launch into a gatekeeping fight.
The government has a real reason to care as models improve at cyber work, biology, and long-running agent tasks. But customer-by-customer approval is a messy substitute for policy. It rewards Washington access, slows useful work for developers and defenders, and makes every frontier launch feel like a national-security negotiation.
If this trusted-partner window closes quickly, fine: awkward transition, growing pains, etc. If it stretches on, the new AI release question becomes: who pays the price when the government decides who gets access and who doesn’t?
Editor’s note: This article originally appeared on our sister publication, The Neuron.


