On March 30, 2026, Ali QianWen proudly announced the official release of its state-of-the-art full-modal large model, Qwen3.5-Omni. This innovative series comes in three different Instruct versions: Plus, Flash, and Light. It boasts the capability to handle 256k long contexts and process audio inputs that span over 10 hours, as well as manage 720P (1 FPS) audio-video inputs exceeding 400 seconds in duration. Qwen3.5-Omni has achieved SOTA (state-of-the-art) performance across 215 diverse tasks, encompassing audio-video comprehension, recognition, and interaction. It has outperformed Gemini-3.1 Pro, establishing itself as one of the globe's most formidable full-modal large models. The model offers robust support for speech recognition in 113 languages and dialects, along with speech generation capabilities in 36 languages and dialects. Currently, users can explore its functionalities through the Offline API and Realtime API. Moreover, Qwen3.5-Omni showcases remarkable audio-video Vibe Coding abilities, enabling users to effortlessly generate product prototype interfaces with intricate UIs through simple verbal commands. General users are invited to experience it free of charge on Qwen Chat. Meanwhile, developers and enterprises can leverage the model via Alibaba Cloud's BaiLian platform, with input pricing set at less than 0.8 yuan per million Tokens.
