Microsoft has announced the release of BitNet b1.58 2B4T, an open-source large language model that leverages a low-precision architecture for native training. This model, boasting 2 billion parameters, impressively requires only 0.4GB of memory. Its performance rivals that of full-precision models, significantly reducing storage and computational demands through groundbreaking ternary systems and W1.58A8 configurations. BitNet has undergone rigorous pre-training, supervised fine-tuning, and preference optimization to further elevate its capabilities. In various benchmark tests, it has demonstrated exceptional performance, especially in energy consumption and decoding latency. However, to fully harness its potential, dedicated framework support is essential. Microsoft aims to enhance hardware support and expand functionality in the future, and the model is currently available for community use.
