Ant Group Releases Its First 100-Billion-Parameter Diffusion Language Model, LLaDA2.0, as Open Source
2025-12-12 / Read about 0 minute
Author:小编   

On December 12, 2025, Ant Technology Research Institute officially unveiled the LLaDA2.0 series of discrete diffusion large language models (dLLMs) and concurrently issued a technical report. This series comprises two variants: a 16-billion-parameter (mini) model and a 100-billion-parameter (flash) model, both built upon the Mixture of Experts (MoE) architecture. Ant Group has, for the first time ever, scaled the parameters of Diffusion models up to the 100-billion-parameter threshold. LLaDA2.0 seamlessly assimilates knowledge from pre-existing autoregressive models through an innovative Warmup-Stable-Decay continuous pre-training approach, thereby sidestepping the exorbitant costs associated with training from the ground up. By incorporating confidence-aware parallel training and a diffusion-model-based version of DPO (Direct Preference Optimization) technology, it guarantees high-quality output generation while capitalizing on the parallel decoding strengths inherent in diffusion models, resulting in a 2.1-fold acceleration in inference speed compared to autoregressive models. In 47 benchmark tests that span knowledge, reasoning, coding, mathematics, agent capabilities, and alignment, the 100-billion-parameter version achieved an average score of 73.18, putting it on par with the formidable autoregressive model Qwen3-30B-A3B-Instruct-2507 (which scored 73.60), and showcasing notable advantages in intricate tasks such as coding and agent capabilities. The model weights and training code have been made available as open source on Huggingface.