NVIDIA launched an open source transcription AI model Paraket-TDT-6B-V2

Join our daily and weekly newspapers for exclusive content on the latest updates and industry-composure AI coverage. learn more

Nvidia has become One of the most valuable companies in the world Thanks to the stock market in recent years, how much demand is there for the graphics processing units (GPU), the powerful chips make Nvidia which is used to present graphics in video games, but at the same time, fast, AI trains large language and spread models.

But Nvidia is much more than making hardware, of course, and software to run it. As the generic AI era wears, Santa Clara-based company is also continuously releasing more and more of its own AI model-what is mostly open sources and free for researchers and developers for free, download, download and use-and the latest of them is the latest of them. Paraket-tdt-6b-v2An automated speech recognition (ASR) model that can do The words of the facial splendor “VB” Srivastava, “Transcheb 60 minutes of audio in 1 second (Mind blow emoji).”

This Parakett model is the new generation of NVIDIA, which was first unveiled and re -updated in January 2024 April of that yearBut this version is so powerful, it is currently at the top Hugging face open asr leaderboard With an average “Word error rate” (the bar model incorrectly transfers a spoken word as 6.05% (out of 100).

In that perspective, it is near the proprietary transcription models such as Openai’s GPT-4O-TRANSCRIBE (with 2.46% in English) and Elevenlabs Scribe (3.3%).

And it is offering all this while being freely available under a commercially permissible Creative Commons CC-BY-4.0 LicenseIt is looking to create an attractive proposal for commercial enterprises and Indie developers to create speech recognition and transcription services in its paid applications.

Performance and benchmark stand

The model claims 600 million parameters and takes advantage of the combination of Fastconformor Encoder and TDT decoder architecture.

It is capable of transferring an hour audio in only one second, provided that it is running on Nvidia’s GPU-quick hardware.

The performance benchmark is measured at the RTFX (real-time factor) of 3386.02 with a batch size of 128, hugging the current ASR on top of the ASR benchmark.

Use cases and availability

On May 1, 2025, was released globally, the objective of Parakeet-TDT-6B-V2 is to manufacture applications such as transcription services, voice assistants, subtittle generators and conversion AI platforms for developers, researchers and industry teams.

The model supports punctualization, capitalization and detailed word-level timstamping, which offers a full transcription package for a wide range of speech-to-read requirements.

Accession and periphery

Developers can deploy models using Nvidia’s Nemo toolkit. The setup process is compatible with python and pitoch, and the use of the model can be done directly or properly for domain-specific functions.

The open-source license (CC-BY-4.0) also allows for business use, making it appeal to startups and enterprises equally.

Training data and model development

Parakeet-TDT-6B-V2 was trained on a diverse and large-scale corpus called a granary dataset. This includes about 120,000 hours of English audio, composed of a 10,000-hour high-quality human-transcode data and 110,000 hours of pseudo-labeled speech.

Sources range from famous datasets such as librispech and Mosila common voice to YouTube-Commons and Librilight.

NVIDIA has planned to make public dataset publicly available after its presentation in Interstitial 2025.

Evaluation and strength

The model was evaluated in several English-language ASR benchmarks, including AMI, earnings 22, gigaspitches and spagspiech, and showed strong normalization performance. It remains strong under various noise conditions and also performs well with telephony-style audio formats, with a low signal-to-shape ratio only slight decline.

Hardware compatibility and efficiency

Parakeet-TDT-6B-V2 has been adapted to the Nvidia GPU environment, which supports hardware such as A100, H100, T4 and V100 boards.

While high-end GPUs maximize performance, the model can still be loaded as 2 GB RAM on the system, which allows for wide deployment scenarios.

Ethical thoughts and responsible uses

Nvidia notes that the model was developed without the use of personal data and follows the AI framework responsible.

Although no specific measures were taken to reduce demographic bias, the model passed the internal quality standards and includes detailed documentation on its training process, dataset Provence and privacy compliance.

The release attracted attention from machine learning and open-sources communities, especially after publicly highlighted on social media. Commentators referred to the ability to improve commercial ASR options while being completely open sources and commercially usable.

Developers interested in trying models can access it through Throat face Or via Nemo toolkit of Nvidia. Installation instructions, demo scripts and integration guidance are easily available for experimentation and deployment facilities.

Daily insights on business use cases with VB daily

If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

Read our privacy policy

Thanks for membership. See more VB newsletters here.

There was an error.

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

This Bose speaker will reach EOL in 2026 – but open source is here to save the day

CES 2026: Follow live with news from Caterpillar and Nvidia, as well as robotaxis, robots, and surprises from the show floor

Samsung’s new 6K monitor can project in 3D without the need for glasses – but this model is more shocking

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

Google tests AI-operated audio overview in search results for some questions

Yes, this was the original voice of the Garat in the trailer for the thief VR

Best LC10 loadout in call of duty: Warzone

Our Picks