Zhipu has recently introduced the GLM-5.1 high-speed API, boasting an impressive model output velocity of up to 400 tokens per second. This achievement disrupts the prevailing industry trend where high-speed models are typically lightweight, marking the first instance in China where a large-scale model combines flagship-level capabilities with minimal latency. Thanks to the backing of the TileRT high-performance inference engine, practical tests demonstrate the model's outstanding performance in AI programming, 3D gaming, and interactive interfaces. This engine is the result of collaborative system-level optimizations by the Zhipu GLM team and the TileRT team. Presently, the GLM-5.1 high-speed API is tailored for scenarios demanding rapid responses and is accessible to a select group of enterprise clients via the Zhipu MaaS platform.
