Intellifusion: Integrating 3D Stacked Memory Architecture into Its In-Development Inference Chip
1 day ago / Read about 0 minute
Author:小编   

On May 12, Intellifusion revealed in its investor relations activity log that the inference chip currently under development is built around the GPNPU architecture, which boasts four key technological advancements. Firstly, it offers general-purpose programming capabilities on par with GPGPU. Addressing the challenge of "ease of use" for domestically produced chips, the GPNPU architecture is designed to be compatible with and support the migration of mainstream ecosystems, such as CUDA. This compatibility significantly lowers the barrier for customers when deploying and migrating models. Secondly, the chip features an NPU core with exceptional energy efficiency. It has undergone deep optimization to enhance both inference efficiency and energy efficiency ratio, thereby improving the cost-effectiveness of inference operations. Thirdly, the chip introduces a 3D stacked memory architecture. This innovative design enables higher bandwidth and lower access latency, effectively breaking through the "memory wall" and boosting inference efficiency. Fourthly, the chip adopts a computing power building block architecture. This approach builds upon the exploration of domestic processes over the past five years, leveraging next-generation chips to construct rack-level Scale-up supernodes. These supernodes are capable of meeting the inference demands of large models with MoE architectures that have trillion or even ten-trillion parameter scales. Through this technological approach, the company aims to exponentially reduce Token costs, thereby facilitating the widespread and inclusive deployment of large model applications.