Dia-1.6B TTS : Best Text-to-Dialogue Generation Model

Mounish V Last Updated : 12 May, 2025
5 min read

Looking for the right text-to-speech model? The 1.6 billion parameter model Dia might be the one for you. You’d also be surprised to hear that this model was created by two undergraduates and with zero funding! In this article, you’ll learn about the model, how to access and use the model and also see the results to really know what this model is capable of. Before using the model, it would be appropriate to get acquainted with it.

What is Dia-1.6B?

The models trained with the goal of having text as input and natural speech as output, are called text-to-speech models. The Dia-1.6B parameter model developed by Nari Labs belongs to the text-to-speech models family. This is an interesting model that is capable of generating realistic dialogue from a transcript. It’s also worth noting that the model can produce nonverbal communications like laugh, sneeze, whistle etc. Exciting isn’t it? 

How to Access the Dia-1.6B?

Two ways in which we can access the Dia-1.6B model:

  1. Using Hugging Face API with Google Colab
  2. Using Hugging Face Spaces

The first one would require getting the API key and then integrating it in Google Colab with code. The latter is a no-code and allows us to interactively use Dia-1.6B. 

1. Using Hugging Face and Colab

The model is available on Hugging Face and can be run with the help of 10 GB of VRAM, provided by the T4 GPU in Google Colab notebook. We’ll demonstrate the same with a mini conversation.

Before we begin, let’s get our Hugging Face access token which will be required to run the code. Go to https://huggingface.co/settings/tokens and generate a key, if you don’t have one already. 

Make sure to enable the following permissions:

Enabling Permissions

Open a new notebook in Google Colab and add this key in the secrets (Name should be HF_Token):

Adding Secret Key

Note: Switch to T4 GPU to run this notebook. Then only you’d be able to use the 10GB of VRAM, required for running this model. 

Let’s now get our hands on the the model:

  1. First clone the Dia’s Git repository:
!git clone https://github.com/nari-labs/dia.git
  1. Install the local package:
!pip install ./dia
  1. Install the soundfile audio library:
!pip install soundfile

After running the previous commands, restart the session before proceeding.

  1. After the installations, let’s do the necessary imports and initialize the model:
import soundfile as sf

from dia.model import Dia

import IPython.display as ipd

model = Dia.from_pretrained("nari-labs/Dia-1.6B")
  1. Initialize the text for the text to speech conversion:
text = "[S1] This is how Dia sounds. (laugh) [S2] Don't laugh too much. [S1] (clears throat) Do share your thoughts on the model."
  1. Run inference on the model:
output = model.generate(text)

sampling_rate = 44100 # Dia uses 44.1Khz sampling rate.

output_file="dia_sample.mp3"

sf.write(output_file, output, sampling_rate) # Saving the audio

ipd.Audio(output_file) # Displaying the audio

Output:

The speech is very human-like and the model is doing great with non-verbal communication. It’s worth noting that the results aren’t reproducible as there are no templates for the voices. 

Note: You can try fixing the seed of the model to reproduce the results.

2. Using Hugging Face Spaces

Let’s try to clone a voice using the model via Hugging Face spaces. Here we have an option to use the model directly on the using the online interface: https://huggingface.co/spaces/nari-labs/Dia-1.6B

Here you can pass the input text and additionally you can also use the ‘Audio Prompt’ to replicate the voice. I passed the audio we generated in the previous section. 

The following text was passed as an input:

[S1] Dia is an open weights text to dialogue model. 
[S2] You get full control over scripts and voices. 
[S1] Wow. Amazing. (laughs) 
[S2] Try it now on Git hub or Hugging Face.

I’ll let you be the judge, do you feel that the model has successfully captured and replicated the earlier voices?

Note: I got multiple errors while generating the speech using Hugging Face spaces, try changing the input text or audio prompt to get the model to work.

Things to remember while using Dia-1.6B

Here are a few things that you should keep in mind, while using Dia-1.6B:

  • The model is not fine-tuned on a specific voice. So, it’ll get a different voice on every run. You can try fixing the seed of the model to reproduce the results.
  • Dia uses 44.1 KHz sampling rate.
  • After installing the libraries, make sure to restart the Colab notebook. 
  • I got multiple errors while generating the speech using Hugging Face spaces, try changing the Input Text or Audio Prompt to get the model to work.

Conclusion

The model results are very promising, especially when we see what it can do compared to the competition. The model’s biggest strength is its support for a wide range of non-verbal communication. The model has a distinct tone and speech feels natural, but on the other hand as it’s not fine-tuned on specific voices, it might not be easy to reproduce a particular voice. Like any other generative AI tool, this model should be used responsibly.

Frequently Asked Questions

Q1. Can we use only two speakers in the conversation?

A. No, you’re not limited to just two speakers. While having two speakers (e.g., [S1] and [S2]) is common for simplicity, you can include more by labeling them as [S1], [S2], [S3], and so on. This is especially useful when simulating group dialogues, interviews, or multi-party conversations. Just be sure to clearly indicate who is speaking in your prompt so the model can correctly follow and generate coherent replies for each speaker. This flexibility allows for more dynamic and context-rich interactions.

Q2. Is Dia 1.6B a paid model?

A. No, Dia 1.6B is entirely free to use. It’s an open-access conversational model hosted on Hugging Face, which means there are no subscription fees or licensing costs involved. Whether you’re a student, developer, or researcher, you can access it without any upfront payment. This makes it a great choice for experimentation, prototyping, or educational use.

Q3. I want to use this model without worrying about the hardware or coding.

A. You can use Dia 1.6B directly through Hugging Face Spaces, which provides a web-based interface. This means you don’t need to set up Python environments, install libraries, or worry about GPU availability. Simply visit this page, and you can interact with the model instantly in your browser.

Q4. Can I fine-tune Dia 1.6B for my own use case?

A. Yes, if you have specific data and want the model to perform better for your domain, Dia 1.6B can be fine-tuned. You’ll need some technical expertise and compute resources, or you can use Hugging Face’s training tools.

Q5. Is there a token or usage limit for Dia 1.6B?

A. No hard limits are enforced by default, but Hugging Face Spaces may have rate or session time restrictions to manage server load.

Passionate about technology and innovation, a graduate of Vellore Institute of Technology. Currently working as a Data Science Trainee, focusing on Data Science. Deeply interested in Deep Learning and Generative AI, eager to explore cutting-edge techniques to solve complex problems and create impactful solutions.

Login to continue reading and enjoy expert-curated content.

Responses From Readers

Clear

OSZAR »