Google Unveils and Open-Sources Gemma 4 12B Multimodal Model, Functional on Devices with 16GB RAM/VRAM - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Google Unveils and Open-Sources Gemma 4 12B Multimodal Model, Functional on Devices with 16GB RAM/VRAM

9 hour ago / Read about 0 minute

Author：小编

Google has recently unveiled and made open-source the Gemma 4 12B multimodal model. This model is meticulously crafted for consumer-grade devices, facilitating the localized operation of AI models. It operates seamlessly on laptops and desktops boasting 16GB of RAM or VRAM. Despite encompassing 12 billion parameters, its intelligence quotient rivals that of the Gemma 26B model.

The Gemma 4 12B model boasts several remarkable benefits:

It embraces a cutting-edge unified architecture, enabling the direct processing of text, image, video, and audio inputs, eliminating the necessity for a multimodal encoder.
It showcases advanced reasoning prowess, with benchmark performance nearly on par with the Gemma 26B Mixture-of-Experts (MoE) architecture model, facilitating local multi-step reasoning.
It has relatively modest memory demands, requiring only 16GB of RAM or VRAM for operation, with performance enhancing as memory capacity increases.

Furthermore, the model is open-sourced under the Apache 2.0 license, with Google and the community collaborating to offer extensive ecosystem support for developers. The model also incorporates various token prediction selectors to effectively minimize latency.

In the realm of visual processing, the Gemma 4 12B model employs a lightweight embedding module in lieu of a visual encoder, integrating just one matrix multiplication, positional embedding, and normalization operation. This design enables the model's backbone network to directly process visual information.

For audio processing, the audio encoder is entirely omitted, projecting raw audio signals into the same dimensional space as text tokens.

Presently, the model is accessible on multiple platforms. Developers can directly experience it on platforms such as Ollama, download model weight files from HuggingFace or Kaggle, or utilize Unsloth for efficient fine-tuning to craft customized versions.

Previous page：ChineseAll: AI-Powered Short Dramas Emerge as Top ...

Next page：Step 3.7 Flash Tops the Charts in Mainstream Model...

Return to List

Hot Reading

2 day ago

Nvidia Vera Rubin Enters Full Production: Samsung, SK Hynix, Micron Named HBM4 Suppliers

1 day ago

Samsung HBM5 Debuts at Computex: Nvidia's Endorsement Still Goes to SK Hynix

1 day ago

Microsoft plans Linux tools and an RTX Spark desktop for Windows developers

1 day ago

Microsoft's Project Solara is an Android OS designed for agents instead of apps