Moore Threads MTT S5000 Successfully Adapts to DeepSeek-V4-Flash
4 hour ago / Read about 0 minute
Author:小编   

On April 24, Moore Threads, in partnership with the Zhiyuan Zhongzhi FlagOS community, achieved Day-0 adaptation of the cutting-edge large model, DeepSeek-V4-Flash, on its premier AI training and inference integrated GPU, the MTT S5000. This accomplishment includes in-depth optimization and deployment support for all fundamental operators. DeepSeek-V4-Flash is built on a Mixture of Experts (MoE) architecture, boasting a staggering 284 billion parameters. It supports context windows of up to one million tokens and is pioneering in the use of FP4+FP8 mixed precision technology. As China's first full-function GPU to natively support FP8, the MTT S5000 significantly reduces memory pressure by 50% and doubles computational throughput, thanks to its hardware-level FP8 Tensor Core. Throughout the adaptation process, the team prioritized FP8 operators and Sparse Attention operators. Through compilation optimization and auto-tuning, they achieved a 16.5% reduction in TTFT latency, a 39.7% reduction in ITL latency, and a 65.7% increase in throughput. At present, both parties are actively working on the migration and adaptation of the 1.6T flagship model, DeepSeek-V4-Pro, on the MTT S5000.