Alibaba Unveils Innovative Qwen Models: Voice Cloning Achieved with Merely Three Seconds of Audio
1 day ago / Read about 0 minute
Author:小编   

The dedicated Qwen team at Alibaba Cloud has recently introduced two groundbreaking artificial intelligence models, designed to generate or clone voices based on textual instructions. Notably, the Qwen3-TTS-VD-Flash model excels at creating voices from detailed descriptions, outperforming even OpenAI's newly released GPT-4o mini-tts API in this regard. The other model, Qwen3-TTS-VC-Flash, demonstrates remarkable capabilities by replicating voices using just a brief three-second audio clip. It supports voice reproduction across ten different languages, boasting a lower error rate compared to its rivals. These two cutting-edge AI models are not only adept at processing complex texts but also capable of mimicking animal sounds and extracting distinctive voice features. At present, users have the convenience of accessing these two models via Alibaba Cloud's API or exploring their demo versions on the renowned Hugging Face platform.