Credit: Juj Winn / Eduard Muzhevski via Getty Images
On Saturday, OpenAI researcher Alexander Wei announced that a new AI language model the company is researching has achieved gold medal-level performance on the International Mathematical Olympiad (IMO), matching a standard that fewer than 9 percent of human contestants reach each year. The announcement came despite an embargo request from IMO organizers asking AI companies to wait until July 28 to share their results.
The experimental model reportedly tackled the contest's six proof-based problems under the same constraints as human competitors: 4.5 hours per session, with no Internet access or calculators allowed. However, several sources with inside knowledge of the process say that since OpenAI self-graded its IMO results, the legitimacy of the company's claim may be in question. OpenAI plans to publish the proofs and grading rubrics for public review.
According to OpenAI, its achievement marks a departure from previous AI attempts at mathematical Olympiad problems, which relied on specialized theorem-proving systems that often exceeded human time limits. OpenAI says its model processed problems as plain text and generated natural-language proofs, operating like a standard language model rather than a purpose-built mathematical system.
The announcement follows Google's July 2024 claim that its AlphaProof and AlphaGeometry 2 models earned a silver medal equivalent at the IMO—though Google's systems required up to three days per problem rather than the 4.5-hour human time limit and needed human assistance to translate problems into formal mathematical language.
"Math is a proving ground for reasoning—structured, rigorous, and hard to fake," the company wrote in a statement sent to Ars Technica. "This shows that scalable, general-purpose methods can now outperform hand-tuned systems in tasks long seen as out of reach."
While the company confirmed that its next major AI model, GPT-5, is "coming soon," it clarified that this current model is experimental. "The techniques will carry forward, but nothing with this level of capability will be released for a while," OpenAI says. It's likely that OpenAI needed to devote a great deal of computational resources (which means high cost) for this particular experiment, and that level of computation won't be typical of consumer-facing AI models in the near future.
OpenAI says that the research team behind the experimental AI model, led by Alex Wei with support from Sheryl Hsu and Noam Brown, hadn't initially planned to enter the competition but decided to evaluate their work after observing promising results in testing.
"This wasn’t a system built for math. It’s the same kind of LLM we train for language, coding, and science—solving full proof-based problems under standard IMO constraints: 4.5 hours, no internet, no calculators," OpenAI said in a statement.
OpenAI received problems that were freshly written by the IMO organizer and shared with several AI companies simultaneously. To validate the results, each solution reportedly underwent blind grading by a panel of three former IMO medalists organized by OpenAI, with unanimous consensus required for acceptance.
However, in addition to the controversy over self-grading the results, OpenAI also annoyed the IMO community because its Saturday announcement appears to have violated the embargo agreement with the International Mathematical Olympiad. Harmonic, another AI company that participated in the competition, revealed in an X post on July 20 that "the IMO Board has asked us, along with the other leading AI companies that participated, to hold on releasing our results until Jul 28th."
The early announcement has prompted Google DeepMind, which had prepared its own IMO results for the agreed-upon date, to move up its own IMO-related announcement to later today. Harmonic plans to share its results as originally scheduled on July 28.
In response to the controversy, OpenAI research scientist Noam Brown posted on X, "We weren't in touch with IMO. I spoke with one organizer before the post to let him know. He requested we wait until after the closing ceremony ends to respect the kids, and we did."
However, an IMO coordinator told X user Mikhail Samin that OpenAI actually announced before the closing ceremony, contradicting Brown's claim. The coordinator called OpenAI's actions "rude and inappropriate," noting that OpenAI "wasn't one of the AI companies that cooperated with the IMO on testing their models."
The International Mathematical Olympiad, which has been running since 1959, represents one of the most challenging tests of mathematical reasoning. More than 100 countries send six participants each, with contestants facing six proof-based problems across two 4.5-hour sessions. The problems typically require deep mathematical insight and creativity rather than raw computational power. You can see the exact problems in the 2025 Olympiad posted online.
For example, problem one asks students to imagine a triangular grid of dots (like a triangular pegboard) and figure out how to cover all the dots using exactly n straight lines. The twist is that some lines are called "sunny"—these are the lines that don't run horizontally, vertically, or diagonally at a 45º angle. The challenge is to prove that no matter how big your triangle is, you can only ever create patterns with exactly 0, 1, or 3 sunny lines—never 2, never 4, never any other number.
The timing of the OpenAI results surprised some prediction markets, which had assigned around an 18 percent probability to any AI system winning IMO gold by 2025. However, depending on what Google says this afternoon (and what others like Harmonic may release on July 28), OpenAI may not be the only AI company to have achieved these unexpected results.