On November 26, the latest leaderboard for the spatial reasoning benchmark test, SpatialBench, was unveiled. Alibaba's QianWen visual understanding models, Qwen3-VL and Qwen2.5-VL, claimed the first and second spots, respectively, outperforming internationally renowned top-tier models including Gemini 3, GPT-5.1, and Claude Sonnet4.5. SpatialBench zeroes in on the reasoning prowess of multimodal models in domains like space, structure, and pathways. It has emerged as a cutting-edge standard for gauging advancements in 'embodied intelligence.' Qwen3-VL-235B and Qwen2.5-VL-72B notched up scores of 13.5 and 12.9 points, respectively, eclipsing models such as Gemini 3.0 Pro Preview (which scored 9.6 points) and GPT-5.1 (with a score of 7.5 points). Qwen3-VL has made significant strides in visual perception and multimodal reasoning. It supports functionalities like 'reasoning with images' and 'visual programming,' while also bolstering 3D detection capabilities to facilitate precise robotic grasping. At present, several versions of Qwen3-VL have been made open-source, encompassing dense models with 2B, 4B, 8B, and 32B parameters, along with MoE models. Additionally, the QianWen APP is accessible for users to try out free of charge.
