FireRedTTS Version Information

Learn about the development history and version features of FireRedTTS

FireRedTTS-2

Latest version FireRedTTS-2 features Long dialogue voice generation: Currently supports 4-minute, 4-character dialogue, which can be easily extended to more characters and longer dialogues by expanding the training corpus.

View Details

FireRedTTS-1

First version released in September 2024, FireRedTTS is the basic text-to-speech system that supports zero-shot voice cloning and emotional voice generation.

View Details

2025

FireRedTTS-2 Release

To meet more complex multi-speaker dialogue generation needs, the team launched FireRedTTS-2. This version is designed for long-form streaming text-to-speech, providing more natural voice output and reliable speaker switching capabilities.

Core Features

Long-form streaming text-to-speech system for multi-speaker dialogue generation
Context-aware prosody control for generating more natural voices
Enhanced speaker switching capabilities
Improved system architecture for enhanced long-term stability
Optimized streaming synthesis performance

September 2024

FireRedTTS-1 Release

The Xiaohongshu FireRed team released the first FireRedTTS version, a text-to-speech system based on large language models. This version supports zero-shot voice cloning, emotional voice generation, and other functions, providing users with high-quality voice synthesis experience.

Core Features

Text-to-speech system based on large language models
Zero-shot voice cloning with just a few seconds of reference audio
Rich emotional voice generation capabilities
Supports Chinese, English, and Chinese-English mixed text processing
Streaming decoder to reduce synthesis latency

Version Comparison

Feature	FireRedTTS-1	FireRedTTS-2
Main Application Scenario	Single-speaker voice synthesis	Multi-speaker dialogue generation
Synthesis Method	Batch processing synthesis	Streaming synthesis
Speaker Switching	Basic support	Optimized support
Context Awareness	Limited support	Deep support
Long Content Processing	Segmented processing	Continuous streaming processing
System Stability	Good	Enhanced