Microsoft Research has pioneered the introduction of the first FP4 large model training framework, which boasts training results on par with FP8 and BF16 under identical hyperparameters, while demanding significantly less storage and computing resources. This groundbreaking framework leverages FP8 to emulate FP4, enabling the training of models with up to 13 billion parameters and handling an impressive 100 billion tokens.
