Dia-1.6B TTS : Best Text-to-Dialogue Generation Model

Mounish V Last Updated : 12 May, 2025

5 min read

Looking for the right text-to-speech model? The 1.6 billion parameter model Dia might be the one for you. You’d also be surprised to hear that this model was created by two undergraduates and with zero funding! In this article, you’ll learn about the model, how to access and use the model and also see the results to really know what this model is capable of. Before using the model, it would be appropriate to get acquainted with it.

What is Dia-1.6B?
How to Access the Dia-1.6B?
- 1. Using Hugging Face and Colab
- 2. Using Hugging Face Spaces
Things to remember while using Dia-1.6B
Conclusion
Frequently Asked Questions

What is Dia-1.6B?

The models trained with the goal of having text as input and natural speech as output, are called text-to-speech models. The Dia-1.6B parameter model developed by Nari Labs belongs to the text-to-speech models family. This is an interesting model that is capable of generating realistic dialogue from a transcript. It’s also worth noting that the model can produce nonverbal communications like laugh, sneeze, whistle etc. Exciting isn’t it?

How to Access the Dia-1.6B?

Two ways in which we can access the Dia-1.6B model:

Using Hugging Face API with Google Colab
Using Hugging Face Spaces

The first one would require getting the API key and then integrating it in Google Colab with code. The latter is a no-code and allows us to interactively use Dia-1.6B.

1. Using Hugging Face and Colab

The model is available on Hugging Face and can be run with the help of 10 GB of VRAM, provided by the T4 GPU in Google Colab notebook. We’ll demonstrate the same with a mini conversation.

Before we begin, let’s get our Hugging Face access token which will be required to run the code. Go to https://huggingface.co/settings/tokens and generate a key, if you don’t have one already.

Make sure to enable the following permissions:

Open a new notebook in Google Colab and add this key in the secrets (Name should be HF_Token):

Note: Switch to T4 GPU to run this notebook. Then only you’d be able to use the 10GB of VRAM, required for running this model.

Let’s now get our hands on the the model:

First clone the Dia’s Git repository:

!git clone https://github.com/nari-labs/dia.git

Install the local package:

!pip install ./dia

Install the soundfile audio library:

!pip install soundfile

After running the previous commands, restart the session before proceeding.

After the installations, let’s do the necessary imports and initialize the model:

import soundfile as sf

from dia.model import Dia

import IPython.display as ipd

model = Dia.from_pretrained("nari-labs/Dia-1.6B")

Initialize the text for the text to speech conversion:

text = "[S1] This is how Dia sounds. (laugh) [S2] Don't laugh too much. [S1] (clears throat) Do share your thoughts on the model."

Run inference on the model:

output = model.generate(text)

sampling_rate = 44100 # Dia uses 44.1Khz sampling rate.

output_file="dia_sample.mp3"

sf.write(output_file, output, sampling_rate) # Saving the audio

ipd.Audio(output_file) # Displaying the audio

Output:

The speech is very human-like and the model is doing great with non-verbal communication. It’s worth noting that the results aren’t reproducible as there are no templates for the voices.

Note: You can try fixing the seed of the model to reproduce the results.

2. Using Hugging Face Spaces

Let’s try to clone a voice using the model via Hugging Face spaces. Here we have an option to use the model directly on the using the online interface: https://huggingface.co/spaces/nari-labs/Dia-1.6B

Here you can pass the input text and additionally you can also use the ‘Audio Prompt’ to replicate the voice. I passed the audio we generated in the previous section.

The following text was passed as an input:

[S1] Dia is an open weights text to dialogue model. 
[S2] You get full control over scripts and voices. 
[S1] Wow. Amazing. (laughs) 
[S2] Try it now on Git hub or Hugging Face.

I’ll let you be the judge, do you feel that the model has successfully captured and replicated the earlier voices?

Note: I got multiple errors while generating the speech using Hugging Face spaces, try changing the input text or audio prompt to get the model to work.

Things to remember while using Dia-1.6B

Here are a few things that you should keep in mind, while using Dia-1.6B:

The model is not fine-tuned on a specific voice. So, it’ll get a different voice on every run. You can try fixing the seed of the model to reproduce the results.
Dia uses 44.1 KHz sampling rate.
After installing the libraries, make sure to restart the Colab notebook.
I got multiple errors while generating the speech using Hugging Face spaces, try changing the Input Text or Audio Prompt to get the model to work.

Conclusion

The model results are very promising, especially when we see what it can do compared to the competition. The model’s biggest strength is its support for a wide range of non-verbal communication. The model has a distinct tone and speech feels natural, but on the other hand as it’s not fine-tuned on specific voices, it might not be easy to reproduce a particular voice. Like any other generative AI tool, this model should be used responsibly.

Frequently Asked Questions

Q1. Can we use only two speakers in the conversation?

A. No, you’re not limited to just two speakers. While having two speakers (e.g., [S1] and [S2]) is common for simplicity, you can include more by labeling them as [S1], [S2], [S3], and so on. This is especially useful when simulating group dialogues, interviews, or multi-party conversations. Just be sure to clearly indicate who is speaking in your prompt so the model can correctly follow and generate coherent replies for each speaker. This flexibility allows for more dynamic and context-rich interactions.

Q2. Is Dia 1.6B a paid model?

A. No, Dia 1.6B is entirely free to use. It’s an open-access conversational model hosted on Hugging Face, which means there are no subscription fees or licensing costs involved. Whether you’re a student, developer, or researcher, you can access it without any upfront payment. This makes it a great choice for experimentation, prototyping, or educational use.

Q3. I want to use this model without worrying about the hardware or coding.

A. You can use Dia 1.6B directly through Hugging Face Spaces, which provides a web-based interface. This means you don’t need to set up Python environments, install libraries, or worry about GPU availability. Simply visit this page, and you can interact with the model instantly in your browser.

Q4. Can I fine-tune Dia 1.6B for my own use case?

A. Yes, if you have specific data and want the model to perform better for your domain, Dia 1.6B can be fine-tuned. You’ll need some technical expertise and compute resources, or you can use Hugging Face’s training tools.

Q5. Is there a token or usage limit for Dia 1.6B?

A. No hard limits are enforced by default, but Hugging Face Spaces may have rate or session time restrictions to manage server load.

Mounish V

Passionate about technology and innovation, a graduate of Vellore Institute of Technology. Currently working as a Data Science Trainee, focusing on Data Science. Deeply interested in Deep Learning and Generative AI, eager to explore cutting-edge techniques to solve complex problems and create impactful solutions.

Generative AI Intermediate

Free Courses

4.7

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

4.5

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

4.6

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

4.8

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

4.7

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Reading list

Dia-1.6B TTS : Best Text-to-Dialogue Generation Model

Table of contents

What is Dia-1.6B?

How to Access the Dia-1.6B?

1. Using Hugging Face and Colab

2. Using Hugging Face Spaces

Things to remember while using Dia-1.6B

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques

Reading list

Introduction to NLP

Text Pre-processing

NLP Libraries

Regular Expressions

String Similarity

Spelling Correction

Topic Modeling

Text Representation

Information Retrieval System

Word Vectors

Word Senses

Dependency Parsing

Language Modeling

Getting Started with RNN

Different Variants of RNN

Machine Translation and Attention

Self Attention and Transformers

Transfomers and Pretraining

Question Answering

Text Summarization

Named Entity Recognition

Coreference Resolution

Audio Data

ASR

Audio Separation

Chatbot

Auto NLP

Dia-1.6B TTS : Best Text-to-Dialogue Generation Model

Table of contents

What is Dia-1.6B?

How to Access the Dia-1.6B?

1. Using Hugging Face and Colab

2. Using Hugging Face Spaces

Things to remember while using Dia-1.6B

Conclusion

Frequently Asked Questions

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Getting Started with Large Language Models

Building LLM Applications using Prompt Engineering

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Microsoft Excel: Formulas & Functions

Recommended Articles

Responses From Readers

Write for us

Flagship Programs

Free Courses

Popular Categories

Generative AI Tools and Techniques

Popular GenAI Models

AI Development Frameworks

Data Science Tools and Techniques