Nari-labs releases open-source Dia2 streaming speech generator

2049.news · 25.11.2025, 13:20:01

Nari-labs releases open-source Dia2 streaming speech generator

Nari-labs published an open-source speech generation model called Dia2 that supports streaming and per-speaker voice samples.

Model overview

Dia2 provides a streaming mode that can begin producing audio from the first words without waiting for full text preprocessing, enabling lower perceived latency for interactive scenarios.

Variants: 1B and 2B model sizes.
Language: generates up to 2 minutes in English; Russian is not supported.
Hardware: designed to run on GPUs with around 8 GB VRAM or less.

Behavior and limitations

Early testing shows variable outputs: Dia2 may introduce unsolicited words, exhibit inconsistent loudness, and deliver speech with rapid pacing and reduced pausing.

Developers note that stable speaker rendering requires either supplying per-speaker audio samples as prefixes or fine-tuning the model on target voices.

License and ecosystem

The project is released under the Apache-2 license, which is permissive and intended to facilitate commercial and community adoption of the code and models.

While Nari-labs positions Dia2 as less mature in quality than established commercial offerings, the maintainers expect community contributions to address current instability and improve naturalness over time.

Open-source ACE‑Step expands models and audio interfaces

Xgrids updates Lixel CyberColor training software to V2

Scroll down to load next post

Nari-labs releases open-source Dia2 streaming speech generator

Model overview

Behavior and limitations

License and ecosystem

Related posts