On April 2 (local time), Google made an official announcement regarding the release of the Gemma 4 open-source large-scale model. This model comes in four distinct variants: the efficient version with 2 billion parameters (E2B), the efficient version with 4 billion parameters (E4B), the 26-billion-parameter Mixture of Experts (MoE) model, and the dense model with 31 billion parameters (31B). Among these, the 31B model secures the third position globally among open-source models on the industry-standard Arena AI text benchmark, while the 26B model holds the sixth spot. The entire Gemma 4 family is designed to support complex logic and agent workflows, boasting advanced capabilities in reasoning, code generation, multimodal vision, and audio processing. The end-side models within this family provide a context window of up to 256K and offer native support for over 140 languages. Through meticulous optimization of computational power and memory efficiency, the model's actual parameter usage during inference is substantially lower than its nominal value. This not only conserves device battery life but also enables the model to operate on consumer-grade GPUs. Furthermore, Google has partnered with manufacturers such as Qualcomm and MediaTek to ensure that Gemma 4 can run offline with near-zero latency on end-side devices like smartphones and Raspberry Pis.
