On April 3, CodeArena, a ranking list under the globally recognized large model blind testing platform LMArena, which focuses on evaluating AI programming capabilities, released its latest rankings. Alibaba’s new-generation large language model, Qwen3.6-Plus, claimed the second position globally, outperforming international heavyweights such as OpenAI, Google, and xAI. It has become the highest-ranked Chinese large model on the list. The ranking system utilizes a real-user blind testing approach combined with a real-time competitive ranking mechanism, widely regarded as one of the most impartial and authoritative performance benchmarks in the AI field.
Qwen3.6-Plus demonstrated exceptional performance, particularly in the React specialized list, which assesses a model’s autonomous coding capabilities in complex web development scenarios. This evaluation demands comprehensive engineering thinking and end-to-end development skills. Qwen3.6-Plus secured second place with a score of 1452, closely trailing Anthropic’s Claude-Opus-4.6-Thinking (1540 points) and surpassing OpenAI’s GPT-5.0-High (1448 points) and Google’s Gemini 3.1 Pro Preview (1440 points).
Additionally, Qwen3.6-Plus also claimed the top spot among Chinese models in the broader Code Arena list, which provides a comprehensive assessment of AI programming capabilities. With this achievement, Alibaba ascended to fourth place in the global AI lab rankings, just behind Anthropic, OpenAI, and Google.
Qwen3.6-Plus marks the inaugural model in Alibaba’s Qwen3.6 series. Other model sizes within the series are set to be open-sourced in the future, and a more powerful flagship model, Qwen3.6-Max, is also slated for release soon.
