Moore Threads MTT S5000 Successfully Adapts to DeepSeek-V4-Flash - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Moore Threads MTT S5000 Successfully Adapts to DeepSeek-V4-Flash

4 hour ago / Read about 0 minute

Author：小编

On April 24, Moore Threads, in partnership with the Zhiyuan Zhongzhi FlagOS community, achieved Day-0 adaptation of the cutting-edge large model, DeepSeek-V4-Flash, on its premier AI training and inference integrated GPU, the MTT S5000. This accomplishment includes in-depth optimization and deployment support for all fundamental operators. DeepSeek-V4-Flash is built on a Mixture of Experts (MoE) architecture, boasting a staggering 284 billion parameters. It supports context windows of up to one million tokens and is pioneering in the use of FP4+FP8 mixed precision technology. As China's first full-function GPU to natively support FP8, the MTT S5000 significantly reduces memory pressure by 50% and doubles computational throughput, thanks to its hardware-level FP8 Tensor Core. Throughout the adaptation process, the team prioritized FP8 operators and Sparse Attention operators. Through compilation optimization and auto-tuning, they achieved a 16.5% reduction in TTFT latency, a 39.7% reduction in ITL latency, and a 65.7% increase in throughput. At present, both parties are actively working on the migration and adaptation of the 1.6T flagship model, DeepSeek-V4-Pro, on the MTT S5000.

Previous page：DeepSeek-V4 Unveiled and Open-Sourced, Unlocking F...

Next page：Google: 75% of New Internal Code Generated by AI, ...

Return to List

Hot Reading

2 day ago

Framework Laptop 13 Pro is a major overhaul for the modular, upgradeable laptop

2 day ago

Framework's CEO on the RAM crisis and creating a "MacBook Pro for Linux users"

2 day ago

Framework Laptop 16 upgrades make it look less like an unfinished prototype

2 day ago

Anthropic gets $5B investment from Amazon, will use it to buy Amazon chips