Alibaba Unveils Qwen3-Omni-Flash: A Full-Modal Large Model with Real-Time Streaming Response Capabilities
2025-12-11 / Read about 0 minute
Author:小编   

On December 9, 2025, Alibaba's Qwen team proudly introduced its latest innovation—the new-generation native full-modal large model, Qwen3-Omni-Flash-2025-12-01. This cutting-edge model is designed to handle a wide array of input types, including text, images, audio, and video, seamlessly. What sets it apart is its ability to generate high-quality text and natural-sounding speech in real-time through streaming responses, with the speech quality coming remarkably close to that of a real human voice.

In terms of linguistic versatility, the model excels, supporting interactions in a staggering 119 text languages, 19 speech recognition languages, and 10 speech synthesis languages. This ensures that users can expect precise and accurate responses, even in complex cross-language scenarios, making it a truly global tool.

Performance-wise, the new model has shown marked improvements across various metrics. Logical reasoning, code generation, and multidisciplinary visual question answering have all seen significant enhancements. Moreover, Alibaba has opened up customization permissions for system prompts, empowering users to fine-tune the model's behavioral patterns and even set specific persona styles to suit their unique needs.

The model is readily accessible via API and has been seamlessly integrated into the Qwen Chat Demo. This integration enables users to generate videos with on-screen narration in real-time, opening up a world of possibilities for content creators, educators, and businesses alike.