Recently, DeepSeek has officially rolled out the DeepSeek-V3.2-Exp model. This innovative model incorporates a sparse Attention architecture, designed to minimize computational resource usage and boost model inference efficiency. Even though it's an experimental iteration, it has already piqued the interest of numerous chip manufacturers, who have all declared their Day 0 support.
Among these manufacturers, Huawei Ascend stands out by swiftly completing the adaptation and deployment process, leveraging inference frameworks like vLLM/SGLang. They've achieved 0-day support and have made all inference code and operator implementations openly available for developers to utilize.
Cambrian, too, has kept pace, completing its adaptation in tandem and releasing the source code of its large model inference engine, vLLM-MLU. By harnessing the DeepSeek Sparse Attention mechanism, Cambrian can substantially cut down on training and inference expenses in scenarios involving long sequences.
Hygon Information has also announced a seamless adaptation and in-depth tuning of its DCU. The DeepSeek-V3.2-Exp model has showcased exceptional performance on Hygon's DCU, underscoring the high versatility, robust ecological compatibility, and independently controllable technological edge of Hygon's DCU.
Furthermore, the official DeepSeek App, its web version, and mini-program have all been upgraded to DeepSeek-V3.2-Exp. In a move that's sure to delight developers, the API has seen a substantial price slash, with developer call costs plummeting by over 50%.