OpenAI's o3 AI Model Falls Short in Benchmark Tests Compared to Promotional Claims
2025-04-21 / Read about 0 minute
Author:小编   

The notable discrepancies between OpenAI's o3 AI model's performance in first-party and third-party benchmark tests have sparked widespread public concerns regarding the company's transparency and model testing methodologies. When OpenAI introduced o3 in December of last year, it boldly proclaimed that the model could solve slightly over a quarter of the problems in the FrontierMath test—a collection of exceptionally challenging mathematical problems. This performance was purported to be a significant improvement over its competitors, with the second-ranked model achieving an accuracy rate of merely around 2%. Nevertheless, these significant discrepancies have fueled doubts about the veracity of the tests and OpenAI's commitment to transparency.