Qwen3.7-Max Wrote Its Own Chip's Software in 35-Hour Run: Alibaba's Full-Stack Bet
1 hour ago / Read about 29 minute
Source:TechTimes

Qwen.ai

Alibaba used the May 20–21 Alibaba Cloud Summit in Hangzhou, China, to launch not one product but three that together form what the company calls a complete "AI factory" stack: Qwen3.7-Max, its new flagship large language model; the Zhenwu M890, a purpose-built AI accelerator developed by its semiconductor subsidiary T-Head; and the Panjiu AL128, a rack-scale server that links 128 M890 accelerators into a single deployable unit. The simultaneous release of all three products, confirmed by reporting from the South China Morning Post and Reuters, is the story — not any individual component.

The framing comes from Alibaba's own senior vice-president of cloud computing, Liu Weiguang, who told summit attendees: "What we're building is China's AI factory," and described Alibaba as the only AI and cloud company in the country operating "all five layers of the full AI stack" — chips, agentic cloud, AI models, model service platforms, and agentic applications. That claim is worth testing, and the central proof point Alibaba offered is an unusually demanding one.

35 Hours, 1,158 Tool Calls, One Chip Alibaba Built Itself

According to Alibaba-linked reporting reconstructed by TrendForce, Qwen3.7-Max ran autonomously for approximately 35 hours on the Zhenwu M890 platform, making roughly 1,158 tool calls and 432 kernel evaluations across five architectural redesigns, with Alibaba reporting a geometric mean 10x speedup on the resulting Extend Attention kernel compared with the reference Triton implementation. The model executed the task with no access to existing chip-architecture documentation or performance-analysis data.

What Alibaba is claiming here is more than raw endurance. The demonstration was designed to show that Qwen3.7-Max can autonomously write and iteratively optimize performance-critical software for Alibaba's own previously undocumented chip — effectively producing the software stack that makes the chip run AI workloads efficiently. The model then ran on the chip it had just optimized. That recursive loop is what gives Alibaba's "full-stack" description its operational meaning.

These figures carry a label that matters: they are Alibaba's first-party results, not third-party reproductions. Developers and enterprises evaluating the model should treat them as Alibaba's characterization of its intended performance strengths — long-horizon, tool-heavy autonomous work — until external benchmarks independently replicate the run.

Third-Party Rankings Place Qwen3.7-Max Among Frontier Tier, Below GPT-5.5 and Claude

Independent assessment from Artificial Analysis, which runs a composite Intelligence Index across ten evaluations including Terminal-Bench Hard, SciCode, and Humanity's Last Exam, places Qwen3.7-Max at a score of 57 — making it the highest-ranked Chinese model on that leaderboard. GPT-5.5 leads at 60.2, with Claude Opus 4.7 at 57.3 and Gemini 3.1 Pro Preview at 57.2. On the LM Arena text leaderboard, the Qwen3.7-Max preview reached 13th overall, with category rankings of 7th in mathematics and 9th in software and IT.

The index gains over Qwen's prior flagship, Qwen3.6-Max-Preview, are concentrated in scientific reasoning, agentic capability, and coding. But one element of the improvement is less straightforward. Qwen3.7-Max's raw accuracy on the AA-Omniscience factual-recall benchmark actually dropped 7.6 percentage points compared with its predecessor, while its hallucination rate fell sharply — a result of the model choosing to abstain from answering rather than recalling more facts. Its attempt rate on that benchmark fell to 48%, the lowest among the frontier models in the comparison. The model now posts the lowest hallucination rate in the frontier tier, but partly by answering fewer questions. That distinction matters for agentic use cases that require confident fact retrieval under load.

No direct, audited comparison between Qwen3.7-Max and Claude Opus 4.7 or GPT-5.5 on agentic tasks of the kind Alibaba demonstrated has yet been published.

As of publication, Alibaba had announced API access for Qwen3.7-Max through its Model Studio platform with availability "coming soon" for developers and enterprises worldwide. Open weights for the 3.7 generation have not been released; QwenLM's GitHub organization and Hugging Face host 3.5 and 3.6 variants but not 3.7. The context window is 1 million tokens, up from 256,000 on Qwen3.6-Max-Preview. Pricing for the 3.7-Max API had not been announced at time of publication; Qwen3.6-Max-Preview was priced at $1.30 per million input tokens and $7.80 per million output tokens on Alibaba Cloud.

Zhenwu M890: Purpose-Built for Agents, Behind Western Flagships on Raw Specs

The Zhenwu M890, developed by T-Head, carries 144 GB of HBM3 memory — 50% more than its predecessor, the Zhenwu 810E — and delivers inter-chip bandwidth of 800 GB/s. Alibaba claims it delivers three times the performance of the 810E. The 810E was itself widely regarded as comparable to Nvidia's H20, the export-restricted version of Nvidia's processor line designed specifically for the Chinese market.

Myron Xie, an AI accelerator analyst at semiconductor research firm SemiAnalysis, told CNBC that Alibaba-designed chips have become a popular platform among Chinese enterprise buyers, but noted that the M890's advertised memory capacity and bandwidth still trail the benchmarks set by leading Western chipmakers, and that key compute performance metrics had not been publicly disclosed. Brady Wang, director at Counterpoint Research, was more direct: "M890 is a small but real contribution to China's AI self-sufficiency. On raw silicon power, M890 is not a true competitor to H200. But it does not need to be. In the China market, it is a believable replacement for H200."

The M890 is paired with ICN Switch 1.0, a new interconnect chip delivering 25.6 Tbps of aggregate bandwidth across clusters of 64 accelerators, and packaged inside the Panjiu AL128 Supernode Server, which links 128 of the chips in a single rack with chip-to-chip communication latency under 150 nanoseconds. The system is immediately available to Chinese enterprise customers through Alibaba Cloud's Bailian platform. T-Head reports shipping more than 560,000 Zhenwu chips to date across more than 400 customers in 20 industries, including China Telecom, FAW Group, and Shanghai Pudong Development Bank.

Alibaba also outlined a chip roadmap through 2028: the V900, targeted for Q3 2027, is projected to triple the M890's performance again, with 216 GB of memory and 1,200 GB/s bandwidth; the J900 follows in Q3 2028.

Why Vertical Integration, Not Export Controls Alone, Drives This

The conventional framing around Chinese AI hardware — that domestic chip development is a forced response to U.S. export controls — is accurate but incomplete in Alibaba's case. The Trump administration lifted the ban on Nvidia H20 chip sales to China in 2025, creating a licensing pathway that technically allows Chinese companies to purchase some advanced U.S. processors again. Alibaba is building its domestic stack anyway.

That choice reflects something more durable than emergency procurement. A company that controls its own chip architecture can tune its model to that architecture in ways that would be impossible with a third-party chip — the kernel-optimization demonstration is the tangible expression of that strategy. Zhang Guobin, founder of Chinese semiconductor publication eetrend.com, noted that the Alibaba launch timing was "extremely precise," falling in a window when broader access to Nvidia's processors in the Chinese market remains uncertain despite the licensing change.

Alibaba has committed more than 380 billion yuan ($53 billion) to cloud and AI infrastructure over three years — its largest such investment — and is reportedly planning a T-Head initial public offering to fund continued chip development. The competitive context inside China is also relevant: Huawei's Ascend chip line, which analysts generally place at least two generations behind Nvidia's cutting-edge offerings, has already secured more than $12 billion in orders for 2026, a 60% increase from 2025 levels, and remains the primary domestic alternative Alibaba's T-Head must displace to capture enterprise deployments.

Framework Compatibility and Developer Access

Alibaba confirmed that Qwen3.7-Max is natively optimized for the major command-line interface agent frameworks developers currently use: OpenClaw, Hermes Agent, Claude Code, Qwen Paw, and Qoder. The design intent is to let a developer run the same model across rapid front-end prototyping, complex multi-file refactoring, and production debugging without switching scaffolding between tasks.

The honest read on developer access right now: the model is announced but not yet fully accessible. The API is rolling out through Alibaba's Model Studio; open weights have not been released for the 3.7 generation; and pricing is unconfirmed. Developers on production timelines should stay on Qwen3.6, which ships with open weights under Apache 2.0 licensing and has confirmed API pricing. The migration path to 3.7-Max begins once the API is live and pricing is published.

China's National Intelligence Law Applies to Alibaba Cloud

Any developer or enterprise considering Qwen3.7-Max through Alibaba's cloud API should be aware of a structural legal condition. Alibaba is a Chinese company subject to China's National Intelligence Law of 2017, Article 7 of which states that any Chinese organization shall "support, assist, and cooperate with national intelligence efforts in accordance with law." Legal scholars disagree on how broadly that obligation can be enforced in practice — some argue it lacks a clear enforcement mechanism, while others contend it shifts obligations from intelligence defense to active cooperation — but the text of the law applies to all Chinese companies without exception. No confirmed incident of government-compelled access to Qwen API user data has been documented. Alibaba has not made a specific public statement addressing this obligation in the context of its international API customers.

What Gets Validated Next

The 35-hour kernel-optimization run is currently self-reported. Independent reproduction of a demonstration of that scope — 1,158 tool calls, five architectural redesigns, on a previously undocumented chip — is a nontrivial task, and is likely weeks away at minimum. The absence of open weights for Qwen3.7-Max means the broader research community cannot probe the model independently yet, only through the API once it is fully live.

The strategic implication does not depend on benchmark replication. Alibaba has publicly demonstrated a frontier-class large language model running autonomously on a domestically developed, mass-produced AI accelerator to write the optimized software that chip needs to run AI workloads. Whether the 35-hour number holds up to external scrutiny or not, the product exists, the chip is shipping, and the integrated stack is now in the hands of Chinese enterprise customers. That combination — model plus chip plus rack-scale server, all from one vendor — is the development that matters regardless of what the benchmarks ultimately show.