Mistral AI's Voxtral Transcribe 2 Launch Breaks Sound Barrier | eWEEK | eWeek

Mistral AI’s Voxtral Transcribe 2 Launch Breaks Sound Barrier

Languages

Image: Adobe Stock

Written By
eWEEK Staff
eWEEK Staff
Feb 5, 2026
3 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Speed of sound? Sounds good.

French firm Mistral AI has unleashed Voxtral Transcribe 2, a new family of speech recognition models, that transcribes “at the speed of sound.”

Voxtral Transcribe 2 consists of two speech-to-text models with transcription quality, diarization, and ultra-low latency. The family includes Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications.

According to Mistral, Voxtral Realtime uses “a novel streaming architecture that transcribes audio as it arrives,” rather than adapting offline models. This enables latency “configurable down to sub-200ms,” a threshold that is critical for voice assistants, live captioning, and conversational AI.

Product strategy

The company also revealed that Voxtral Realtime is released as open weights under the Apache 2.0 license, allowing organizations to deploy it on their own infrastructure, including edge devices. This has significant implications for privacy-sensitive industries such as healthcare, finance, and government, where sending audio data to third-party clouds is often restricted.

This emphasis on open-source and integration is not incidental. As companies increasingly worry about vendor lock-in and data sovereignty, Mistral positions itself as an alternative to closed, US-based AI platforms.

Pricing is addressed candidly. It’s usage-based, starting around €5K/month ($5,896). This signals that Mistral is targeting mid-to-large organizations rather than individual developers, while still framing its services as competitive on cost.

Performance

Mistral claims Voxtral Mini Transcribe V2 achieves “approximately 4% word error rate on FLEURS” at a price of “$0.003/min,” which it describes as “the best price-performance of any transcription API.”

The company says the model outperforms offerings from GPT-4o mini Transcribe, Gemini 2.5 Flash, Assembly Universal, and Deepgram Nova on accuracy, while processing audio “approximately 3x faster than ElevenLabs’ Scribe v2” at “one-fifth the cost.”

If independently validated, these claims could disrupt a market where speech-to-text pricing has remained relatively high, particularly for multilingual and diarized transcription. Lower costs make it economically viable to transcribe large volumes of meetings, calls, and media archives that were previously too expensive to process.

Advertisement

Enterprising ideas

Beyond raw transcription, Voxtral Mini Transcribe V2 introduces features aimed squarely at enterprise use. These include speaker diarization with precise timestamps, context biasing to handle domain-specific vocabulary, word-level timestamps, and improved robustness in noisy environments such as “factory floors” and “busy call centers.”

The model supports recordings up to three hours in a single request and operates across 13 languages, including English, Chinese, Hindi, Arabic, and several European and Asian languages. Mistral notes that “non-English performance significantly outpaces competitors,” addressing a long-standing weakness in speech AI dominated by English-centric training data.

Mistral frames Voxtral as a foundational layer for multiple industries. Media and broadcast organizations can generate live multilingual subtitles, while regulated sectors can rely on diarization and timestamps for compliance and audit trails. Both Voxtral models support GDPR- and HIPAA-compliant deployments through on-premise or private cloud setups.

Where speech AI is headed

The Voxtral launch illustrates how speech AI is moving from novelty to infrastructure. The combination of open weights, aggressive pricing, and real-time performance suggests that competition is shifting away from who has the largest model toward who can deliver practical, deployable systems.

The success of Voxtral Transcribe 2 may hinge less on technical benchmarks and more on whether it delivers the cost savings and efficiency gains promised.

In December, Mistral had the wind in its sails with the launch of the Mistral 3 model family.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.