Microsoft has recently launched the open-source mathematical reasoning model, rStar2-Agent. Remarkably, despite boasting only 14 billion parameters, this model matches the performance of those with 671 billion parameters, thanks to its intelligent reasoning capabilities. rStar2-Agent can autonomously devise reasoning steps, utilize code tools, and validate ideas based on tool feedback. This proficiency is attributed to its incorporation of the GRPO-RoC algorithm, an efficient reinforcement learning infrastructure, and a multi-stage training approach. These innovations allow rStar2-Agent to achieve efficient training with minimal resources and demonstrate robust generalization across various tasks. This groundbreaking development offers fresh perspectives on the evolution of large models, hinting that future iterations may increasingly prioritize intelligent thinking and tool utilization skills.