FireRedTTS Version Information
Learn about the development history and version features of FireRedTTS
FireRedTTS-2
Latest version FireRedTTS-2 features Long dialogue voice generation: Currently supports 4-minute, 4-character dialogue, which can be easily extended to more characters and longer dialogues by expanding the training corpus.
View DetailsFireRedTTS-1
First version released in September 2024, FireRedTTS is the basic text-to-speech system that supports zero-shot voice cloning and emotional voice generation.
View Details2025
FireRedTTS-2 Release
To meet more complex multi-speaker dialogue generation needs, the team launched FireRedTTS-2. This version is designed for long-form streaming text-to-speech, providing more natural voice output and reliable speaker switching capabilities.
Core Features
- Long-form streaming text-to-speech system for multi-speaker dialogue generation
- Context-aware prosody control for generating more natural voices
- Enhanced speaker switching capabilities
- Improved system architecture for enhanced long-term stability
- Optimized streaming synthesis performance
September 2024
FireRedTTS-1 Release
The Xiaohongshu FireRed team released the first FireRedTTS version, a text-to-speech system based on large language models. This version supports zero-shot voice cloning, emotional voice generation, and other functions, providing users with high-quality voice synthesis experience.
Core Features
- Text-to-speech system based on large language models
- Zero-shot voice cloning with just a few seconds of reference audio
- Rich emotional voice generation capabilities
- Supports Chinese, English, and Chinese-English mixed text processing
- Streaming decoder to reduce synthesis latency
Version Comparison
Feature | FireRedTTS-1 | FireRedTTS-2 |
---|---|---|
Main Application Scenario | Single-speaker voice synthesis | Multi-speaker dialogue generation |
Synthesis Method | Batch processing synthesis | Streaming synthesis |
Speaker Switching | Basic support | Optimized support |
Context Awareness | Limited support | Deep support |
Long Content Processing | Segmented processing | Continuous streaming processing |
System Stability | Good | Enhanced |