Xiwang Unveils Next-Gen Inference GPU Chip, Qiwang S3: Slashing Inference Cost per Token by 90%
1 week ago / Read about 0 minute
Author:小编   

The domestic GPU producer Xiwang has recently made a splash by introducing its next-generation inference GPU chip, the Qiwang S3. This marks the company's first major public debut following its acquisition of approximately RMB 3 billion in strategic funding over the past year. Tailored specifically for large-scale model inference tasks, the Qiwang S3 is a customized GPGPU (General Purpose Graphics Processing Unit) chip that delivers over ten times the overall cost-efficiency of its predecessor in typical inference scenarios.

The chip supports precision flexibility, enabling seamless switching from FP16 (16-bit floating point) to FP4 (4-bit floating point) precision. Moreover, it quadruples the memory capacity compared to its predecessors. In mainstream large model inference scenarios, this translates to a staggering reduction of approximately 90% in the cost per token.

In addition to the Qiwang S3, Xiwang also introduced the Huanwang SC3-256 super-node solution. The company has launched an inference cost optimization initiative in collaboration with ecosystem partners and inked a strategic cooperation agreement with Zhejiang University. Previously functioning as the large chip division of SenseTime, Xiwang had successfully delivered over 10,000 chips by 2025.

  • C114 Communication Network
  • Communication Home