Here’s a sentence that should make anyone with crypto slightly nervous: OpenAI’s newest coding agent (GPT 5.3-Codex) can successfully hack and drain funds from vulnerable crypto smart contracts 72% of the time.
OpenAI (alongside crypto investment firm Paradigm) just released EVMbench, a new benchmark that tests how well AI agents can find, fix, and exploit security vulnerabilities in smart contracts (the self-executing code that manages over $100 billion in crypto assets).
Quick refresher if you’re not a crypto person
Smart contracts are basically automated vaults. They hold your money and follow rules written in code. If there’s a bug in that code, someone (or something) can drain the vault. And unlike your bank, there’s no customer service line to call; it’s irreversible.
Side note: Is anyone making smart agentic contracts that use an AI to reason about its hard-coded rules before executing them to avoid this issue?
Here’s what the benchmark found:
GPT-5.3-Codex scored 72.2% on exploit tasks, meaning it successfully drained funds from vulnerable contracts nearly three-quarters of the time. For context, GPT-5 scored just 31.9% on the same tasks six months ago.
AI is better at attacking than defending. Detection (finding bugs) and patching (fixing them) are still much harder; the best model only caught ~46% of vulnerabilities.
Give the AI a small hint about where to look, and patch success jumps from 39% to 94%. The bottleneck isn’t skill; it’s search.
The paper also includes a wild case study: a GPT-5.2 agent discovered and executed a flash loan attack (a complex multi-step exploit), draining a test vault’s entire balance in a single transaction. No human guidance, no step-by-step instructions.
OpenAI is framing this as a defensive tool, and they’re putting money behind it: $10 million in API credits for cybersecurity researchers, plus an expanding beta of Aardvark, their AI security research agent, and a new Trusted Access for Cyber program for vetted security professionals.
Why this matters
The same AI that can write your emails and debug your code is now capable of draining a crypto vault in minutes. The hope is that defenders adopt these tools faster than attackers do. Because the race between AI-powered offense and defense is very real, and right now, it kinda feels like offense is winning?
Editor’s note: This content originally ran in the newsletter of our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.


