Tencent Hunyuan AI Infra New Open-Source Release: Comprehensive Upgrade of HPC-Ops Inference Core Operators - Cloud

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Tencent Hunyuan AI Infra New Open-Source Release: Comprehensive Upgrade of HPC-Ops Inference Core Operators

10 hour ago / Read about 0 minute

Author：小编

HPC-Ops has released an open-source upgrade featuring five key operators, designed to enhance the adaptability of inference systems to dynamic workload demands and meet the requirements of core modules for complex precision and high-performance fused operators. This upgrade effectively addresses multiple engineering bottlenecks on mainstream inference platforms, such as long-tail latency in Attention, memory transfer overhead, and cross-card communication issues, surpassing existing open-source baselines in several performance metrics. Key improvements include: The Attention operator achieves up to a 2.95x speedup in long-text processing and a 17% improvement in end-to-end QPM through dynamic workload scheduling; Router GEMM utilizes a dual BF16 GEMM combination to achieve FP32-level precision, delivering a 3.22x speedup over CuBLAS FP32; FusedMoE constructs a full-module pipeline, improving performance by 1.2x to 1.6x compared to vLLM and SGLang; Fused AllReduce+Norm fuses cross-GPU communication and computation, achieving a 1.04x to 1.68x speedup over NCCL and FlashInfer; Sampler integrates sampling computation into 2 CUDA Kernels, delivering a 4.0x to 7.5x speedup over vLLM and a 1.9x to 4.7x speedup over FlashInfer.

Previous page：Goldman Sachs: AI Investment Scale Still Underesti...

Next page：Meituan Forges a Cooperative Governance Framework ...

Return to List

Hot Reading

2 day ago

Google just fired a warning shot in the AI subscription price wars

2 day ago

Nvidia and SK Group Sign First Multi-Platform AI Alliance: Memory, Cloud, Fab Design Under One Deal

1 day ago

Datadog veterans launch AI coding startup Niteshift on a bet against Big AI lock-in

2 day ago

Samsung Pursues Nvidia HBM4E Supply and LP40 Foundry Work After Seoul Bilateral