DeepSeek, in a collaborative effort with Peking University and Tsinghua University, has unveiled a paper on ArXiv, introducing a novel framework for agent reasoning known as DualPath. This innovative framework is designed to tackle the I/O bottleneck problem that arises during long-text reasoning for agents. It achieves this by incorporating a 'storage-to-decode' pathway, which transforms the conventional single-path loading model. This transformation enables the global pooling of cluster storage bandwidth and facilitates dynamic load balancing. In practical tests utilizing a 660B-scale model, DualPath demonstrated a significant enhancement in performance. It boosted offline reasoning throughput by 1.87 times and online service throughput by an average of 1.96 times. Moreover, it optimized the latency of the first character without compromising the speed of token generation. DualPath establishes a dual-path model that encompasses a reasoning engine, a traffic manager, and a central scheduler. It also offers two optimization strategies: a compute NIC-centric traffic management system and an adaptive request scheduler. Experimental findings indicate that DualPath can effectively surmount I/O constraints in large-scale model reasoning and elevate the efficiency of LLM reasoning systems for intelligent agents. The paper's lead author is Wu Yongtong, a Ph.D. candidate at Peking University, who specializes in system software and large-scale model infrastructure research.
