MiniMax Releases Open-Source Programming Agent Instruction-Following Benchmark: OctoCodingBench
2026-01-15 / Read about 0 minute
Author:小编   

AI large-scale model company MiniMax has recently made the programming agent instruction-following benchmark, OctoCodingBench, available as an open-source resource. This benchmark is specifically crafted to assess an agent's capacity to follow instructions with an awareness of the surrounding scaffolding in code repository contexts. Presently, the majority of benchmarks zero in on task completion, yet they fail to consider whether the agent adheres to rules while carrying out the tasks. Nevertheless, in real-world programming settings, agents are required to comply with system-level behavioral restrictions as well as project coding standards. OctoCodingBench is capable of evaluating an agent's adherence to seven distinct types of heterogeneous instruction sources. Its core features encompass the ability to differentiate between task completion and rule compliance, manage multi-source heterogeneous constraints, and more. The released benchmark comprises 72 meticulously chosen examples, encompassing task specifications, system prompts, and other relevant content. All task environments have been consolidated into publicly accessible Docker images, which users can directly pull and inspect. For further information, please visit: https://huggingface.co/datasets/MiniMaxAI/OctoCodingBench.