Microsoft Partners with Tsinghua and Peking Universities to Introduce Reward Reasoning Models: Adaptively Allocating Computing Resources According to AI Task Complexity
2025-05-27 / Read about 0 minute
Author:小编   

In a collaborative effort, Microsoft Research has teamed up with Tsinghua University and Peking University to unveil Reward Reasoning Models (RRMs). These models enhance the evaluation of intricate tasks by dynamically allocating computing resources through a rigorous reasoning process. Leveraging the Qwen2 model, RRMs adopt a Transformer-decoder architecture, effectively transforming reward modeling into a text completion task. In benchmark tests conducted on RewardBench and PandaLM Test, RRMs have demonstrated exceptional performance, particularly in managing complex queries, where they efficiently utilize computing resources during testing, surpassing baseline models by a significant margin. Research suggests that as the model scales up and the reasoning time increases, the accuracy of RRMs is poised for further enhancement.