Microsoft Partners with Tsinghua and Peking Universities to Introduce Reward Reasoning Models: Adaptively Allocating Computing Resources According to AI Task Complexity

2025-05-27 / Read about 0 minute

Author：小编

In a collaborative effort, Microsoft Research has teamed up with Tsinghua University and Peking University to unveil Reward Reasoning Models (RRMs). These models enhance the evaluation of intricate tasks by dynamically allocating computing resources through a rigorous reasoning process. Leveraging the Qwen2 model, RRMs adopt a Transformer-decoder architecture, effectively transforming reward modeling into a text completion task. In benchmark tests conducted on RewardBench and PandaLM Test, RRMs have demonstrated exceptional performance, particularly in managing complex queries, where they efficiently utilize computing resources during testing, surpassing baseline models by a significant margin. Research suggests that as the model scales up and the reasoning time increases, the accuracy of RRMs is poised for further enhancement.

Previous page：Baidu Xinxiang iOS Version Officially Launched, En...

Next page：UAE Joins Forces with OpenAI to Provide Compliment...

Return to List

Hot Reading

2 day ago

Samsung 900-Layer NAND Prototype Sets World Record: CMB Technique Doubles Stack Height

1 day ago

Custom AI Chips Outpace Nvidia GPU Growth in 2026: ASIC Shipments Set to Triple GPU Rate

2 day ago

AYANEO Unveils the Konkr Pocket Block, a Smaller Game Boy Remake That Has AI

2 day ago

Samsung Electronics Bonus Deal Faces Shareholder Lawsuit as Micron, TSMC Widen Capex Lead

2 day ago

Semiconductor Substrate Warpage Has a New Korean Fix: Viacore and Aqlaser Target Glass

2 day ago

Physics, Simulation, and the Dirty Secret of AI Training: A Conversation with Dov (Dubi) Katz

2 day ago

AI Deepfake Pornography Charges: 140 Victims Named as Take It Down Act Claims First Major Arrests

2 day ago

Best Gaming Laptops Under $1000 in 2026: Top Budget Picks for Smooth Performance and Value

2 day ago

Samsung Google Smart Glasses Ship Fall 2026: Gemini Sees, Data Policy Missing

2 day ago

Too Early Until It's Too Late: Bond CEO Doron Kempel on Why Families Need Preventative Personal Security

Previous page：Baidu Xinxiang iOS Version Officially Launched, En...

Next page：UAE Joins Forces with OpenAI to Provide Compliment...