Ex-OpenAI CTO Mira Murati’s Startup Fixed a Major ‘Unfixable’ AI Bug

Mira Murati, Thinking Machine Lab. Source: X

Written By

Sep 17, 2025

3 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

OpenAI’s former CTO Mira Murati’s new startup recently dropped its first research paper, and it’s about fixing something that’s been bugging AI engineers since ChatGPT launched.

The paper from Murati’s startup Thinking Machines Lab (or, just Thinky, for those in the know) is titled “Defeating Nondeterminism in LLM Inference” and tackles the problem of “reproducibility” in language models like ChatGPT.

A few definitions before we start:

Nondeterminism: Getting different answers when asking ChatGPT the same question twice with identical settings.
LLM: Large language model like ChatGPT, Claude, or Gemini.
Inference: When you ask a language model a question, and it generates a response.

The problem

Even when you set AI models to their most predictable setting (temperature = 0), you still get different answers to the same question. Engineers have been pulling their hair out thinking it was just “one of those computer things” that couldn’t be fixed. Turns out, they were wrong.

The solution

The discovery came from Horace He, a PyTorch (code for AI) wizard who recently jumped from Meta to join Murati’s team. He’s the guy behind torch.compile, which is that thing that makes AI models run 2-4x faster with one line of code.

Horace and his team discovered the real culprit isn’t the usual suspect: floating-point math weirdness that engineers typically blame; instead, it’s something called “batch invariance.” Think of it like this:

Imagine ordering the same coffee at Starbucks, but it tastes different depending on how many other customers are in line. That’s essentially what’s happening with AI models.
When an AI server is busy handling lots of requests, it processes them in batches.
Your request gets bundled with others because that’s more efficient, and somehow this changes your specific answer… even though it shouldn’t.
Follow the logic, and the busier the server, the more your results vary.

This also happens in real life. Have you ever been to your local Starbucks during coffee rush hour? Unless you have a god-tier level barista, your order might not taste the same!

Why this matters

This matters because of these problems:

AI companies doing research can’t reproduce their own experiments reliably.
Businesses using AI for critical decisions get inconsistent results.
Training new AI models becomes way more expensive when you can’t trust your outputs.

Thinky released its solution as open-source code, which is true to Murati’s promise of “science is better when shared.” The team calls their approach “batch-invariant kernels,” which basically teaches AI servers to give you the same coffee regardless of the line.

Why the AI giants should be nervous

This is just the appetizer from a team that recently raised $2 billion without even having a product (although they are working on one, internally called “RL for businesses” that customizes models for a company’s specific business metrics, which sounds very cool).

If fixing decade-old “unfixable” problems is their opening move, the AI giants should probably be nervous (though the code is open, so yum yum yum, as the hungry, hungry, AI labs say).

Editor’s note: This content originally ran in our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.