Subquadratic Launches SubQ, a 12M-Token AI Model for Long-Context Tasks | eWeek

Subquadratic Launches SubQ, a 12M-Token AI Model for Long-Context Tasks

The Neuron featured image about New AI with native 12M memory.

Image: The Neuron

Written By
Grant Harvey
Grant Harvey
May 6, 2026
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

“AI memory” has been a euphemism for workarounds, because models can’t actually hold much in context. Transformer attention scales O(n²) (double the input, quadruple the cost), so the industry duct-taped fixes on top: chunk it, summarize it, pre-search it, pray.

Subquadratic wants to flush the duct tape. The lab came out of stealth on Tuesday with $25 million in seed funding and launched SubQ, the first commercial LLM built on a fully sub-quadratic architecture, with a native 12 million-token context window at roughly 1/5 the cost of frontier models. Backers include former SoftBank Vision Fund partner Javier Villamizar and Tinder co-founder Justin Mateen.

If the company’s claims hold up outside its own benchmarks, the pitch is simple: stop teaching models how to search their notes and let them read the room.

Here’s what happened:

  • The architecture (SSA, or Subquadratic Selective Attention) scales linearly with input length and runs 52× faster than FlashAttention at 1M tokens.
  • SubQ scored 97% on RULER 128,000 (long-context accuracy; Opus 4.6: 94%) at $8 to run versus ~$2,600 on frontier models.
  • On MRCR v2 (multi-needle retrieval), SubQ scored 83 vs Opus’s 78, GPT-5.4’s 39, and Gemini 3.1 Pro’s 23.
  • At 12M tokens (beyond any frontier model’s reach), SubQ hit 92% recall. Targeting 100M by Q4.
  • Two products live today: a 12 million-token API and SubQ Code, a CLI agent that loads your whole repo in one pass.

Why this matters

The memory problem has spawned an engineering discipline. RAG breaks documents into chunks and pre-searches them.

Agent frameworks split tasks among sub-agents by passing notes. MIT’s Recursive Language Models hand the prompt to the model as a file, and it writes code to search. Claude’s Managed Agents give it a folder where it saves notes between sessions. Each is engineered to dodge one fact: standard attention can’t afford to read everything at once.

If the architecture itself holds 12 million tokens cheaply, much of that scaffolding stops being load-bearing. You skip chunking, embedding, and orchestration; you just ask. (Cross-session memory like CLAUDE.md and /memories solves a different problem and stays useful.) The win is cost and context; Anthropic’s Opus 4.7 still leads SWE-Bench at 87.6% (SubQ: 81.8%).

Our take

We’ve heard “this replaces transformers” before (looking at you, Mamba). The receipts look different this time, with PhDs from Meta, Google, Oxford, and Cambridge behind it, and API access live today. The open question: does SubQ scale to frontier-level capability, or do long-context specialists and dense models split the market?

Editor’s note: This content originally ran in the newsletter of our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.

Grant Harvey

Grant Harvey is the Lead Writer of The Neuron, where he continues to lead the publication's daily coverage of AI news, tools, and trends.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.