On January 21st, coinciding with the first anniversary of DeepSeek-R1's launch, details of DeepSeek's new model, MODEL1, emerged. DeepSeek made updates to the FlashMLA code repository on GitHub, with the term "MODEL1" appearing 28 times across 114 different files. This newly revealed model sets itself apart from V32. For context, V32 is recognized as DeepSeek-V3.2, suggesting that MODEL1 might signify a groundbreaking new architectural design. The code discrepancies are mainly evident in aspects such as the KV cache layout, sparsity management, and FP8 decoding processes. Furthermore, there are numerous differences in memory optimization techniques, hinting at significant advancements.
