Meta Launches Intelligent Agent Evaluation Platform ARE
15 hour ago / Read about 0 minute
Author:小编   

Meta has launched a new evaluation platform, Agents Research Environment (ARE), along with a new benchmark model Gaia2, designed to assess the performance of intelligent agents in real-world applications. ARE aims to simulate real-world environments, where tasks are executed asynchronously and time flows continuously, requiring agents to adjust and complete tasks under dynamic constraints. Gaia2, as a core component of ARE, focuses on evaluating the adaptability of intelligent agents in complex environments. Unlike its predecessor Gaia1, Gaia2 not only assesses the agent's ability to find answers but also evaluates its performance in the face of changing conditions, deadlines, API failures, and ambiguous instructions. Additionally, Gaia2 supports various protocols such as Agent2Agent to assess the collaborative capabilities among agents. Its evaluation process is conducted asynchronously; even when agents are idle, time continues to pass, thereby testing their responsiveness to new events. Currently, OpenAI's GPT-5 has demonstrated excellent performance on the Gaia2 benchmark.