ByteDance's VAPO Framework Breaks AIME24 Record, Significantly Boosting Large Language Model Inference Capabilities
2025-04-12 / Read about 0 minute
Author:小编   

ByteDance has unveiled the VAPO reinforcement learning training framework, designed to bolster the inference prowess of large language models in complex, extended tasks. Building upon the PPO framework, VAPO integrates cutting-edge technologies such as value training, length-adaptive generalized advantage estimation, and a synergistic effect system. Following optimization, the Qwen2.5-32B model's score on the AIME24 benchmark skyrocketed from 5 points to 60.4 points, outperforming the DeepSeek R1 and DAPO methods. VAPO particularly shines in mathematical reasoning and long sequence tasks, offering a more stable and efficient training process. The harmonious integration of these technologies underpins VAPO's exceptional performance.