OpenAI's o3 AI Model Falls Short in Benchmark Tests Compared to Promotional Claims

2025-04-21 / Read about 0 minute

Author：小编

The notable discrepancies between OpenAI's o3 AI model's performance in first-party and third-party benchmark tests have sparked widespread public concerns regarding the company's transparency and model testing methodologies. When OpenAI introduced o3 in December of last year, it boldly proclaimed that the model could solve slightly over a quarter of the problems in the FrontierMath test—a collection of exceptionally challenging mathematical problems. This performance was purported to be a significant improvement over its competitors, with the second-ranked model achieving an accuracy rate of merely around 2%. Nevertheless, these significant discrepancies have fueled doubts about the veracity of the tests and OpenAI's commitment to transparency.

Previous page：Netflix Rolls Out AI-Powered Search to Understand ...

Next page：OpenAI's Latest Models Plagued by Notable Hallucin...

Return to List

Hot Reading

2 day ago

Apple Music partners with Ticketmaster to power its concert discovery feature

2 day ago

Crunchyroll confirms data breach after hacker claims unauthorized access

1 day ago

Google launches Lyria 3 Pro music generation model

2 day ago

FCC bans import of new consumer routers made overseas, citing security risks