Huawei Cloud CEO Zhang Ping'an: CloudMatrix384 AI Token Service Now Fully Operational
2 day ago / Read about 0 minute
Author:小编   

On September 19, during Huawei Connect 2025 (HC 2025), Zhang Ping'an, who serves as both an Executive Director at Huawei and the CEO of Huawei Cloud Computing, unveiled the complete rollout of Huawei Cloud's CloudMatrix384 AI Token service. Building on Huawei's cutting-edge AI server architecture, the CloudMatrix cloud-based super-node specifications are set to undergo a significant upgrade, expanding from the current 384 cards to an impressive 8,192 cards in the near future. This expansion will facilitate the creation of enormous clusters, ranging from 500,000 to 1 million cards, thereby furnishing formidable AI computing power tailored for the intelligent age.

This service attains a remarkable single-card inference throughput of 2,300 Tokens per second. This achievement is made possible through resource pooling, a fully peer-to-peer interconnection architecture, and the integration of the xDeepServe inference framework. As a result, it delivers performance that is 3-4 times superior to that of NVIDIA's H20. Moreover, it supports the training of large models boasting trillion-level parameters. Presently, the service has already begun offering top-tier Token services to prominent large models like Pangu, DeepSeek, and Qwen, thereby expediting the implementation and innovation of AI technologies across a multitude of industries.