Google has offered a preview of its latest artificial intelligence model, Gemini—dubbed "Gemini 2.5 Computer Use". This model boasts web - browsing and browser - interaction capabilities, empowering it to dissect user requests and execute tasks like form - filling and submission. It achieves this by harnessing its "visual understanding and reasoning abilities". It is well - suited for scenarios such as user interface testing and has already been employed in Google's AI models and the "Mariner" project.
This launch comes hot on the heels of OpenAI's announcement regarding the rollout of a new application for ChatGPT. OpenAI is centering its efforts on the "ChatGPT agent" feature. Meanwhile, Anthropic rolled out a "computer use" iteration of its Claude AI model last year.
Google has put out a demo video and asserts that its model surpasses existing solutions in multiple benchmark tests. Nevertheless, at present, the model's access is restricted to the browser environment, and it supports 13 types of operations.
Currently, developers can access the model via Google AI Studio and Vertex AI. Additionally, an online demo is available on the Browserbase platform.