Based on the official announcement from Tongyi Qianwen, on December 24, 2025, Aliyun formally introduced two cutting-edge voice synthesis models within the Qwen3-TTS series: the Qwen3-TTS-VD-Flash (Timbre Creation Model) and the Qwen3-TTS-VC-Flash (Timbre Cloning Model). The Qwen3-TTS-VD-Flash model allows for the customization of timbre, rhythm, emotion, and even persona via natural language instructions, providing users with highly detailed control. In the InstructTTS-Eval assessment, this model surpassed competitors such as GPT-4o-mini-tts. Meanwhile, the Qwen3-TTS-VC-Flash model supports timbre cloning with just a 3-second audio sample and is capable of generating speech in 10 languages: Chinese, English, Japanese, Korean, French, German, Spanish, Italian, Portuguese, and Russian. In multilingual tests, its word error rate outperformed leading solutions like MiniMax and ElevenLabs. At present, both models have made their Flash version APIs available on Aliyun's BaiLian platform, catering to industrial-scale voice synthesis requirements.
