On January 14, 2026, AI firm MiniMax unveiled OctoCodingBench as an open-source, systematic evaluation benchmark tailored for code agents. This milestone represents the industry's inaugural comprehensive assessment framework explicitly crafted for Coding Agents. The benchmark is engineered to gauge an agent's proficiency in adhering to instructions within a code repository setting. The findings reveal that although all models attained a Check-level accuracy rate (CSR) exceeding 80%, their Instance-level success rate (ISR) hovered between a mere 10% and 30%. Notably, adherence to procedural guidelines emerged as a persistent challenge, while open-source models are swiftly closing the performance gap with their proprietary counterparts.
