Google Unveils Gemini 2.5 Model with Computer-Using Capabilities
2 day ago / Read about 0 minute
Author:小编   

At the Google I/O Developer Conference held earlier this year, Google made an exciting announcement: it would be integrating computer-using capabilities into the Gemini API. Today, Google has officially launched the Gemini 2.5 model with computer-using features. This new iteration is built on the foundation of Gemini 2.5 Pro and is meticulously crafted to empower AI agents in seamlessly interacting with user interfaces (UIs).

Google has highlighted that the new model surpasses its counterparts in a variety of web and mobile control benchmark tests, boasting significantly lower response latency. The core functionalities of this model are delivered through the newly introduced 'Computer Use' tool within the Gemini API. This tool supports a comprehensive range of 13 interface operations, encompassing essential tasks such as page navigation, web search, and cursor hovering.

Developers can leverage this model by providing it with user requests, current screen captures, and operation histories. Based on this information, the model will generate appropriate interface operation function calls, including clicks or inputs. However, it's worth noting that certain high-risk operations necessitate user confirmation to ensure safety and accuracy.

At present, developers can access the Gemini 2.5 model with computer-using capabilities through Google AI Studio and Vertex AI. Additionally, an online demo is available on the Browserbase platform, offering a hands-on experience of this innovative technology.

  • C114 Communication Network
  • Communication Home