JD.com has made public its decision to open-source its in-house-developed, large-scale model inference engine, xLLM, which operates on domestically manufactured chips. This engine is designed to empower enterprises by facilitating the deployment of AI applications with exceptional performance at a lower cost, thereby accelerating the intelligent transformation of various industries.
xLLM boasts several technological advancements, such as a dynamic scheduler that prioritizes requests, a dynamically adaptive PD separation architecture, and compatibility with multimodal scenarios. From a technical standpoint, xLLM incorporates a multi-level pipeline execution engine, a suite of computational optimizations, and a multi-level KV cache global management system.
Rooted in JD.com's core retail operations, the engine has already been implemented in numerous scenarios, resulting in a more than fivefold increase in efficiency and a 90% reduction in machine costs. JD.com has expressed its intention to further expand the engine's capabilities in response to community demands, working in tandem with industry, academic, and research collaborators to propel technological innovation within China's AI infrastructure landscape.
Following its open-source release, developers are invited to explore and utilize the engine, fostering the growth and development of China's AI technology ecosystem.