On September 24, 2025, Alibaba’s Tongyi Large Model Team unveiled the newly enhanced Qwen3-VL series models and declared the open-sourcing of its premier version, the Qwen3-VL-235B-A22B series. As the most advanced visual-language model within the Qwen family, Qwen3-VL not only equips the model to perceive images or videos but also endows it with the capacity to interpret the world, grasp events, and initiate actions. During official demonstrations, the model displayed impressive visual-driven reasoning and execution skills, enabling it to manage devices like smartphones and computers, and carry out tasks such as launching applications, tapping buttons, inputting data, and more, all through natural language commands. Additionally, it can effortlessly conduct flight searches and bookings. Qwen3-VL is adept at identifying a vast array of objects, with its knowledge spanning celebrities, culinary delights, flora and fauna, automotive brands, anime characters, and beyond. In a thorough assessment across ten categories, the Qwen3-VL-235B-A22B-Instruct model excelled in most metrics among non-reasoning models, outperforming proprietary models like Gemini 2.5 Pro and GPT-5, while establishing new standards for open-source multimodal models. Presently, both Qwen3-VL-235B-A22B-Instruct and Qwen3-VL-235B-A22B-Thinking are accessible as open-source resources on platforms including Github, Hugging Face, and ModelScope.