BAAI: FlagOS Achieves Seamless Day0 Compatibility with Eight AI Chips for DeepSeek V4, Marking Triple Technological Leaps
3 hour ago / Read about 0 minute
Author:小编   

DeepSeek has recently unveiled its state-of-the-art model, DeepSeek-V4-Pro, boasting an impressive 1.6 trillion parameters, alongside its streamlined counterpart, DeepSeek-V4-Flash, with 284 billion parameters. Under the guidance of the Beijing Academy of Artificial Intelligence (BAAI), the FlagOS system has successfully achieved full compatibility with both models. It has also adeptly accomplished the adaptation and inference deployment of DeepSeek-V4-Flash across more than eight distinct AI chips. Currently, the team is diligently working on streamlining the migration and adaptation process for DeepSeek-V4-Pro. DeepSeek-V4-Flash leverages a sophisticated Mixture of Experts (MoE) architecture, enabling it to handle context processing of up to a staggering 1 million tokens. It showcases significant strides in both architectural design and pre-training methodologies. To ensure compatibility across multiple chips, FlagOS has made three pivotal technological advancements: comprehensive operator substitution via FlagGems, the implementation of independent tensor parallelism strategies for o-group, and the precision conversion from 'FP4+FP8 mixed precision' to BF16. Notably, FlagGems has open-sourced a suite of high-performance operators that surpass the capabilities of native operators. Following adaptation by FlagOS, the models' core functionalities remain on par with their native counterparts, while the deployment process is notably streamlined. The FlagOS 2.0 technology stack offers end-to-end support for the cross-chip adaptation of large models. This includes a high-performance operator library (FlagGems), a unified AI compiler (FlagTree), a tool for model cross-chip migration and deployment (FlagRelease), and a unified multi-chip access plugin (vLLM-plugin-FL). FlagOS has cultivated a comprehensive open-source technology ecosystem, providing developers with robust support for cross-chip adaptation.

Next page:No More