SGLang Inherently Backs Ascend, Facilitating One-Click Launch of New Models sans Code Alterations
3 week ago / Read about 0 minute
Author:小编   

On December 20th, the SGLang AI Financial π Event wrapped up successfully in Hangzhou. This gathering zeroed in on the efficiency of large-scale model inference, delving deep into the engineering hurdles that inference systems encounter when handling real-world business workloads. Organized by SGLang in tandem with the AtomGit community, the event drew extensive participation from frontline engineering squads.

Given the loftier benchmarks set by Agent applications for inference systems, SGLang unveiled a slew of engineering practice solutions. These encompassed integrating the HiCache system to curtail memory consumption, leveraging Mooncake technology to condense weight loading and model startup durations, among others. Notably, these technologies have been seamlessly implemented on the Ascend platform.

The event also spotlighted SGLang's cutting-edge progress on the Ascend platform, spanning various facets such as model optimization, system attributes, and quantization prowess. It also highlighted the optimization of models like DeepSeek and Qwen. Since July, Ascend has been diligently collaborating with SGLang on adaptation efforts, with the aim of fully embracing the open-source ecosystem and propelling its development. At present, it has successfully concluded grayscale testing for DeepSeek V3.2.

Peering into the future, Ascend is poised to ramp up its systematic engineering investments in inference systems. This strategic move is designed to cater to the demands of real-world business scenarios that necessitate high concurrency and minimal latency.