Nari Labs has launched Dia, an open-source text-to-speech model that offers advanced features like emotional tone control, speaker tagging, and nonverbal audio cues. Running on PyTorch 2.0+ with CUDA 12.6, Dia outperforms competitors like ElevenLabs and Sesame in handling natural timing, emotional range, and nonverbal expressions. The Apache 2.0-licensed model requires 10GB VRAM and delivers 40 tokens per second on NVIDIA A4000 GPUs. Developed by a two-person team with support from Google TPU Research Cloud and Hugging Face, Dia is available via GitHub and Hugging Face, with a consumer version in development.