Google has recently unveiled an enhanced version of the Gemma 3 series, leveraging Quantization-Aware Training (QAT) technology and 4-bit integer precision to cater seamlessly to consumer-grade GPUs. This breakthrough eliminates the constraint that intricate models were previously restricted to high-end data center accelerators. In keeping with its earlier commitment to minimize model size and computational demands, Google has successfully delivered on this promise with the introduction of the QAT version.
