OpenAI Unveils PaperBench: A New Benchmark for Evaluating AI Agents' Research Reproduction Capabilities - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

OpenAI Unveils PaperBench: A New Benchmark for Evaluating AI Agents' Research Reproduction Capabilities

2025-04-03 / Read about 0 minute

Author：小编

OpenAI has announced the introduction of PaperBench, a novel benchmark designed to gauge the proficiency of AI agents in replicating cutting-edge research. This challenge necessitates agents to thoroughly comprehend, reconstruct codebases, and conduct experiments for 20 seminal papers from ICML 2024, starting from scratch. Preliminary tests reveal that Claude 3.5 Sonnet, the top-performing agent, has achieved an average reproduction score of 21.0%, yet it still falls short of surpassing the human baseline.

Previous page："Unstoppable Technology" Secures Tens of Millions ...

Next page：Meta Announces Multi-Year Collaboration with Ultim...

Return to List

Hot Reading

2 day ago

Canva acquires startups working on animation and marketing

2 day ago

Panasonic, the former plasma king, will no longer make its own TVs

2 day ago

Stripe, PayPal Ventures bet on India’s Xflow to fix cross-border B2B payments

2 day ago

The Evolution of Smart Floor Care in a Tech Driven World