DeepSeek Unleashes Another Major Move, Boosting Inference Speed by 85%—How Did They Do It? - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

DeepSeek Unleashes Another Major Move, Boosting Inference Speed by 85%—How Did They Do It?

5 hour ago / Read about 0 minute

Author：小编

On June 27, the DeepSeek team, in collaboration with Peking University, released a technical report unveiling the DSpark framework and the DeepSpec full-stack codebase. Building upon the existing DeepSeek-V4-Pro and DeepSeek-V4-Flash models, this update introduces a server-side speculative decoding module, DSpark, with a strong focus on optimizing engineering deployment efficiency. DSpark employs a semi-autoregressive generation architecture, integrating a parallel backbone network with lightweight serial modules to address the issue of declining acceptance rates in parallel draft models during long-sequence generation. Additionally, it introduces a confidence-based scheduling validation mechanism that dynamically adjusts validation lengths based on hardware status and concurrency pressure, ensuring efficient allocation of computational resources. The framework has been deployed in the DeepSeek-V4 online service system, resulting in a 60%-85% increase in single-user generation speed for V4-Flash and a 57%-78% increase for V4-Pro under equivalent system throughput conditions, all without compromising output quality. The accompanying open-source DeepSpec codebase provides end-to-end tools for data preparation, model training, and evaluation, supporting the MIT license. It includes three draft model algorithms—DSpark, DFlash, and Eagle3—and is compatible with mainstream foundation models such as Qwen3 and Gemma. This open-source release lowers the barriers to private deployment and online service implementation for large models, accelerating the large-scale adoption of intelligent agents, industrial code generation, financial sentiment analysis, and other applications.

Previous page：California Governor Newsom Signs Deal with Anthrop...

Next page：Multiple Users of DeepSeek API Report Receiving Pr...

Return to List

Hot Reading

2 day ago

Electric Fan Car McMurtry Spéirling PURE: 95% New, Full Reveal Next Week

2 day ago

MWC Shanghai 2026 Closes: Huawei Pushes U6 GHz as First Commercial 5G-A Launches Loom

2 day ago

OpenAI Cerebras Bet Spawns Jalapeño Chip as GPT-5.6 Faces Government Gate

2 day ago

Kia PV5 Expands to 16 Variants at Busan Mobility Show: AI Patrol Car Debuts