DeepSeek Debuts Sparse Attention for Faster, More Efficient AI

A man sitting with logos of AI assistant in the middle.

Image: AndersonPiza/Envato

Écrit par

Sep 29, 2025

2 minute read

eWeek Le contenu et les recommandations de produits sont indépendants de la rédaction. Nous pouvons gagner de l'argent lorsque vous cliquez sur des liens vers nos partenaires. En savoir plus

After being a critical player in the deep reasoning movement among AI companies, DeepSeek has introduced an experimental technique called DeepSeek Sparse Attention. The new mechanism is designed to explore and validate optimizations for training and inference efficiency when responding to long queries, DeepSeek said on Sept 29.

What is a sparse attention mechanism?

In generative AI, a sparse attention mechanism is a method of preventing the neural architecture from connecting every token to every other token. Instead, each token attends to a smaller subset of tokens. Using a sparse attention mechanism reduces the computation and memory needed to produce a response. The difference is most visible in queries containing thousands or hundreds of thousands of tokens.

DeepSeek-V3.2-Exp with sparse attention has weights and code available on HuggingFace for local use. It is also available on the web, the DeepSeek app, and through the API.

The release is “an intermediate step toward our next-generation architecture,” DeepSeek wrote on the model card on Sept. 29.

Precision formats: FP8 today, BF16 in progress

DeepSeek has suggested its newest models support FP8 or Floating Point 8 architecture, Bloomberg reported on Monday. FP8 is commonly used in AI training to improve efficiency through faster computation and less memory consumption. In addition, DeepSeek is working on supporting BF16 or Brain Floating Point 16, a format that supports increased calculation speed.

Pricing and developer access

The company cut DeepSeek API prices by 50% or more on Monday. Doing so could give prospective developers an easier on-ramp to the new models, which DeepSeek may be betting will bring on more long-term customers.

Huawei supports DeepSeek, providing a Chinese Nvidia alternative

Huawei says its Ascend chips will support inference of the new model, Bloomberg said. Its Ascend chips will be able to run the new model. Meanwhile, Huawei is ramping up production in China, presenting itself as an alternative to the Nvidia’s AI chips produced in the US.

Last week, Huawei detailed a three-year plan to use its new “SuperPoD” systems to link thousands of Ascend processors together.

Huawei competes with Nvidia, but US-China export rules create an uncertain regulatory environment for advanced AI chips.

DeepSeek’s latest AI model won’t hit the market on schedule, with supply issues tied to Nvidia chips slowing deployment. See how this delay could reshape competition in the global AI race.

Megan Crouse

Megan Crouse has a decade of experience in business-to-business news and feature writing, including as first a writer and then the editor of Manufacturing.net. Her news and feature stories have appeared in Military & Aerospace Electronics, Fierce Wireless, TechRepublic, and eWeek. She copyedited cybersecurity news and features at Security Intelligence. She holds a degree in English Literature and minored in Creative Writing at Fairleigh Dickinson University.