NVIDIA has released a new version of its Parakeet transcription tool, boasting the lowest error rate of any of its competitors. In addition, the company made the code public on GitHub.

Parakeet TDT 0.6B is a 600-million-parameter automatic speech recognition model. It can transcribe 60 minutes of audio per second, Hugging Face data scientist Vaibhav Srivastav said on X on May 5.

The model is recommended for, but not limited to, “conversational AI, voice assistants, transcription services, subtitle generation, and voice analytics platforms.” Parakeet TDT 0.6B transcription is only available in English.

How to access the new Parakeet tool and what it can do

NVIDIA released Parakeet TDT 0.6B under a commercially permissive Creative Commons license, which means developers can incorporate its transcription into their own products for enterprise use or individual sale. NVIDIA said it provides accurate transcriptions, including song lyrics, with automatic punctuation and capitalization; special attention is paid to accurately transcribing spoken numbers.

Hugging Face’s Open ASR Leaderboard confirms that accuracy; in fact, version 2 of Parakeet TDT 0.6B sits at the top of the leaderboard, above products from Microsoft and OpenAI. Parakeet TDT 0.6B V2 also surpasses many of NVIDIA’s other transcription models. The exact performance of each instance may vary based on hardware.

Parakeet TDT 0.6B can be retrieved from Hugging Face and through NVIDIA’s NeMo toolkit.

It was based on Fast Conformer encoder architecture, an encoder found in NVIDIA NeMo. It was trained on the Granary dataset, a corpus of about 120,000 hours of English speech data including human-transcribed speech and auto-labeled speech from sources such as the YouTube-Commons dataset.

Parakeet’s place in NVIDIA’s portfolio and competitors

Releasing Parakeet TDT 0.6B as open source matches with NVIDIA’s overall position in the generative AI industry. NVIDIA provides infrastructure and tools for enabling today’s proliferation of AI, especially the GPUs that serve as the primary hardware. Parakeet TDT 0.6B is just one of the many AI-based tools and services it offers.

The next highest scoring model on the leaderboard is Microsoft’s Phi-4-multimodal-instruct, which can transcribe speech in 23 languages.

