Moonshot AI and Tsinghua University Jointly Launch PrfaaS Architecture, Boosting Large Model Inference Throughput by 54%

4 hour ago / Read about 0 minute

Author：小编

According to marktechpost, Moonshot AI and the research team from Tsinghua University have jointly released the Prefill-as-a-Service (PrfaaS) architecture. This architecture breaks through hardware deployment limitations for large model inference by decoupling the prefill and decoding phases. PrfaaS offloads long-context prefill tasks to an independent high-computing-power cluster and utilizes standard Ethernet to transmit KVCache to the local decoding cluster, enabling cross-data-center collaboration. Its introduced dynamic threshold routing and dual-timescale scheduler dynamically allocate resources and optimize transmission based on request length. Actual tests show that this architecture increases service throughput by 54% compared to homogeneous baselines and by 32% compared to naive heterogeneous configurations, while reducing first-character generation latency by 50%.

Previous page：Institution: Global AI Optical Transceiver Module ...

Next page：Tongyi Lab Unveils Fun-ASR 1.5: Pioneering Industr...

Return to List

Hot Reading

2 day ago

Gemini can now search your phone's photo library to make better images

2 day ago

Startup Unicorns in 2026 Reveal How Tech Startups are Now Worth Billions

2 day ago

Amazon won’t release Fire Sticks that support sideloading anymore

2 day ago

Meta's AI spending spree is helping make its Quest headsets more expensive