On March 2, 2026, Tongyi Lab, a research entity under Alibaba Cloud, announced the launch of two groundbreaking voice generation models: Fun-CosyVoice3.5 and Fun-AudioGen-VD. These models stand out for their ability to be directly manipulated through natural language instructions, offering users unprecedented control over voice generation outcomes. Fun-CosyVoice3.5 boasts multilingual voice cloning capabilities and refined expression control, now encompassing four additional languages, including Thai. Notably, it has achieved a significant reduction in the error rate for rare character pronunciation, from 15.2% to a mere 5.3%. This improvement ensures more stable and fluid long-text reading experiences. Moreover, the model has enhanced its responsiveness, with first-packet latency decreasing by 35%, thus enabling quicker real-time interaction feedback. On the other hand, Fun-AudioGen-VD specializes in sound design and scenario-based audio generation. It empowers users to craft target timbres, emotional expressions, and complete auditory scenes, delivering an immersive listening experience that transcends traditional voice generation boundaries. Both models are accessible to developers via API, facilitating seamless integration and innovation in various applications.
