Nari-labs releases open-source Dia2 streaming speech generator

2049.news · 25.11.2025, 13:20:01

Nari-labs releases open-source Dia2 streaming speech generator


Nari-labs published an open-source speech generation model called Dia2 that supports streaming and per-speaker voice samples.

Model overview

Dia2 provides a streaming mode that can begin producing audio from the first words without waiting for full text preprocessing, enabling lower perceived latency for interactive scenarios.

  • Variants: 1B and 2B model sizes.
  • Language: generates up to 2 minutes in English; Russian is not supported.
  • Hardware: designed to run on GPUs with around 8 GB VRAM or less.

Behavior and limitations

Early testing shows variable outputs: Dia2 may introduce unsolicited words, exhibit inconsistent loudness, and deliver speech with rapid pacing and reduced pausing.

Developers note that stable speaker rendering requires either supplying per-speaker audio samples as prefixes or fine-tuning the model on target voices.

License and ecosystem

The project is released under the Apache-2 license, which is permissive and intended to facilitate commercial and community adoption of the code and models.

While Nari-labs positions Dia2 as less mature in quality than established commercial offerings, the maintainers expect community contributions to address current instability and improve naturalness over time.


Related posts

TwelveLabs releases Marengo 3 video indexing and search tool
Decart releases real-time character animation update
Scroll down to load next post