FireRedTTS Version Information

Learn about the development history and version features of FireRedTTS

FireRedTTS-2

Latest version FireRedTTS-2 features Long dialogue voice generation: Currently supports 4-minute, 4-character dialogue, which can be easily extended to more characters and longer dialogues by expanding the training corpus.

View Details

FireRedTTS-1

First version released in September 2024, FireRedTTS is the basic text-to-speech system that supports zero-shot voice cloning and emotional voice generation.

View Details

2025

FireRedTTS-2 Release

To meet more complex multi-speaker dialogue generation needs, the team launched FireRedTTS-2. This version is designed for long-form streaming text-to-speech, providing more natural voice output and reliable speaker switching capabilities.

Core Features

  • Long-form streaming text-to-speech system for multi-speaker dialogue generation
  • Context-aware prosody control for generating more natural voices
  • Enhanced speaker switching capabilities
  • Improved system architecture for enhanced long-term stability
  • Optimized streaming synthesis performance

September 2024

FireRedTTS-1 Release

The Xiaohongshu FireRed team released the first FireRedTTS version, a text-to-speech system based on large language models. This version supports zero-shot voice cloning, emotional voice generation, and other functions, providing users with high-quality voice synthesis experience.

Core Features

  • Text-to-speech system based on large language models
  • Zero-shot voice cloning with just a few seconds of reference audio
  • Rich emotional voice generation capabilities
  • Supports Chinese, English, and Chinese-English mixed text processing
  • Streaming decoder to reduce synthesis latency

Version Comparison

Feature FireRedTTS-1 FireRedTTS-2
Main Application Scenario Single-speaker voice synthesis Multi-speaker dialogue generation
Synthesis Method Batch processing synthesis Streaming synthesis
Speaker Switching Basic support Optimized support
Context Awareness Limited support Deep support
Long Content Processing Segmented processing Continuous streaming processing
System Stability Good Enhanced