After a period of silence, DeepSeek has made new moves by updating its DeepGEMM codebase and launching a new project called Mega MoE. Contributed by DeepSeek's infrastructure team, this project integrates previously scattered MoE computing processes into a single mega-kernel, enabling parallel data communication and computation, thereby improving GPU utilization. This improvement is particularly effective in multi-GPU, large-scale MoE scenarios. Additionally, DeepSeek is exploring technologies such as combined precision and developing an FP4 indexer to further enhance MoE efficiency. Currently, Mega MoE is still under development, and performance data is yet to be released. This update represents DeepSeek's attempt at a restructuring at the infrastructure layer, aiming to drive MoE towards large-scale and highly efficient operation. Mega MoE may be the first step in this process and could also imply that DeepSeek is using NVIDIA's latest top-tier B-series training cards.
