Ali Tongyi Laboratory Unveils FIPO Algorithm to Substantially Boost Reasoning Abilities of Large-Scale Models

5 day ago / Read about 0 minute

Author：小编

The Qwen Pilot team from Ali Tongyi Laboratory has unveiled a groundbreaking algorithm, FIPO, designed to overcome the reasoning limitations of large-scale models. Conventional reinforcement learning techniques often fail to pinpoint crucial Tokens. FIPO tackles the challenge of 'reasoning length stagnation' by incorporating the Future-KL mechanism, which incentivizes Tokens that exert a significant influence on subsequent reasoning processes. Moreover, the team employs symbolic log-probability discrepancies to discern optimization directions. Experimental results reveal that, within a 32B-scale pure reinforcement learning framework, FIPO surpasses models of equivalent scale. It successfully breaks through the reasoning length bottleneck in zero-shot models, elevating the average reasoning length to exceed 10,000 Tokens. This advancement markedly enhances reasoning precision and underscores its promise in intricate mathematical reasoning tasks.

Previous page：OpenAI Executive Known as the 'Mother of GPT-4o' A...

Next page：DeepSeek V4 Set for Another Grayscale Trial: All-N...

Return to List

Hot Reading

2 day ago

Snap gets closer to releasing new AI glasses after years-long hiatus

2 day ago

Tesla Is Reportedly Working on an Affordable Compact Electric SUV

2 day ago

PSA: If you use the Meta AI app, your friends will find out and it will be embarrassing

2 day ago

Microsoft's "commitment to Windows quality" starts with overhaul of beta program