Subquadratic Launches SubQ, a 12M-Token AI Model for Long-Context Tasks

The Neuron featured image about New AI with native 12M memory.

Image: The Neuron

Written By

May 6, 2026

2 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

“AI memory” has been a euphemism for workarounds, because models can’t actually hold much in context. Transformer attention scales O(n²) (double the input, quadruple the cost), so the industry duct-taped fixes on top: chunk it, summarize it, pre-search it, pray.

Subquadratic wants to flush the duct tape. The lab came out of stealth on Tuesday with $25 million in seed funding and launched SubQ, the first commercial LLM built on a fully sub-quadratic architecture, with a native 12 million-token context window at roughly 1/5 the cost of frontier models. Backers include former SoftBank Vision Fund partner Javier Villamizar and Tinder co-founder Justin Mateen.

If the company’s claims hold up outside its own benchmarks, the pitch is simple: stop teaching models how to search their notes and let them read the room.

Here’s what happened:

The architecture (SSA, or Subquadratic Selective Attention) scales linearly with input length and runs 52× faster than FlashAttention at 1M tokens.
SubQ scored 97% on RULER 128,000 (long-context accuracy; Opus 4.6: 94%) at $8 to run versus ~$2,600 on frontier models.
On MRCR v2 (multi-needle retrieval), SubQ scored 83 vs Opus’s 78, GPT-5.4’s 39, and Gemini 3.1 Pro’s 23.
At 12M tokens (beyond any frontier model’s reach), SubQ hit 92% recall. Targeting 100M by Q4.
Two products live today: a 12 million-token API and SubQ Code, a CLI agent that loads your whole repo in one pass.

Why this matters

The memory problem has spawned an engineering discipline. RAG breaks documents into chunks and pre-searches them.

Agent frameworks split tasks among sub-agents by passing notes. MIT’s Recursive Language Models hand the prompt to the model as a file, and it writes code to search. Claude’s Managed Agents give it a folder where it saves notes between sessions. Each is engineered to dodge one fact: standard attention can’t afford to read everything at once.

If the architecture itself holds 12 million tokens cheaply, much of that scaffolding stops being load-bearing. You skip chunking, embedding, and orchestration; you just ask. (Cross-session memory like CLAUDE.md and /memories solves a different problem and stays useful.) The win is cost and context; Anthropic’s Opus 4.7 still leads SWE-Bench at 87.6% (SubQ: 81.8%).

Our take

We’ve heard “this replaces transformers” before (looking at you, Mamba). The receipts look different this time, with PhDs from Meta, Google, Oxford, and Cambridge behind it, and API access live today. The open question: does SubQ scale to frontier-level capability, or do long-context specialists and dense models split the market?

Editor’s note: This content originally ran in the newsletter of our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

Subquadratic Launches SubQ, a 12M-Token AI Model for Long-Context Tasks

Why this matters

Our take

Grant Harvey

Company

Categories