NVIDIA Unveils Fugatto: A Revolutionary 2.5B Parameter AI Audio Generator | eWEEK | eWeek

NVIDIA Unveils Fugatto: A Revolutionary 2.5B Parameter AI Audio Generator

DJ mixer with headphones

Top view of DJ Mixer with headphones. Elements and details of artists working tools – DJ console with knobs and black headphones. Soft focus.

Written By
Sunny Yadav
Sunny Yadav
Dec 15, 2024
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

In a groundbreaking leap for audio AI, NVIDIA has introduced Fugatto, a 2.5 billion-parameter AI audio generator designed to redefine how sound is created and transformed. Developed by a team of generative AI researchers, Fugatto is a versatile tool capable of producing and manipulating music, voices, and sounds using simple text and audio prompts. This innovation, heralded as a “Swiss Army knife for sound,” pushes the boundaries of AI-powered creativity, enabling users to generate sounds never heard before.

The All-in-One Solution for Audio AI

While many AI models specialize in isolated tasks like composing songs or voice modulation, Fugatto’s unparalleled flexibility sets it apart. From crafting music snippets based on textual descriptions to adding or removing instruments from existing tracks, it seamlessly handles multiple audio generation and transformation tasks. Fugatto’s versatility unlocks potential in multiple fields:

  • Music Production: Artists can prototype song ideas, experiment with styles, and fine-tune tracks.
  • Advertising: Agencies can tailor voiceovers with different accents and emotional tones for localized campaigns.
  • Education: Language learning tools could replicate a learner’s chosen voice, from a family member to a fictional character.
  • Gaming: Developers can dynamically modify or create audio assets based on in-game actions.

Emergent Capabilities in Generative AI Research

Fugatto leverages emergent properties—unexpected abilities arising from its diverse training—allowing users to combine free-form instructions into complex, layered outputs. For instance, it can produce speech in a French accent infused with sadness or blend auditory elements like thunderstorms transitioning into birdsong. With fine-tuning, Fugatto can perform tasks it wasn’t explicitly trained on, such as generating high-quality singing voices from text prompts.

Fugatto ComposableART feature enables real-time instruction blending to give creators nuanced control over attributes like accent intensity or tonal shifts. The model’s temporal interpolation feature further allows users to shape how sound evolves, such as crafting a thunderstorm crescendo that transitions into a serene dawn chorus.

“In my tests,” said Rohan Badlani, an AI researcher who helped design the model, “Fugatto often made me feel like an artist.”

Fugatto’s creation was a monumental undertaking. Its 2.5 billion parameters were trained on NVIDIA DGX systems using 32 H100 Tensor Core GPUs. The development team—a global collaboration spanning Brazil, India, China, and beyond—spent over a year curating millions of diverse audio samples and uncovering new relationships in data.

Read our review of the Murf AI text-to-speech generator to learn more about how generative AI can be used for audio production.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.