
Follow ZDNET: Add us as a favorite source On Google.
ZDNET Highlights
- Google’s new AI model can interact directly with the website UI.
- It joins similar tools from OpenAI and Anthropic.
- The company also acknowledged its weaknesses, including hallucinations.
Google is owned by DeepMind Launched a new AI model In a public preview that’s designed to navigate a web browser like a human.
Built on top of Gemini 2.5 Pro, the company’s new computer usage model can perform tasks such as clicking, typing, and scrolling directly within a web page.
Also: 5 reasons why I use local AI on my desktop – instead of ChatGPT, Gemini or the cloud
Users simply have to give it a prompt in natural language – like, “Open Wikipedia, search ‘Atlantis’, and summarize the history of the myth in Western thought.” The model will automatically fetch URLs and screenshots of the requested site to analyze the user interface it needs to perform, and perform the requested task step by step, outlining its logic and actions in text boxes that are easily visible to users. If he’s been instructed to perform a sensitive task, such as making a purchase, he may also respond by asking for confirmation.
The Gemini 2.5 computer usage preview follows the release of similar web-browsing models from OpenAI and Anthropic. Google previously launched an experimental Chrome extension called Project Mariner, which can also take actions on behalf of users within web pages.
how it works
The Gemini 2.5 computer uses an iterative looping function that allows it to keep a record of all its recent actions within a particular user interface and determine its next action accordingly. So the more functions it performs on a particular site, the more context it will have and the more seamlessly it will function.
Google posted demo videos (up to speed 3x) that showed the model automatically updating a customer relationship management site and rearranging notes on Google’s Jamboard platform, which was discontinued late last year.
Also: ChatGPT’s codec just got a major upgrade that makes it more powerful than ever – What’s New
according to a blog post Published by Google on Tuesday, the new model outperformed similar tools from Anthropic and OpenAI in terms of both accuracy and latency in “multiple web and mobile control benchmarks,” including online Mind2Web, an evaluation framework for testing the performance of web-browsing agents.
How to try it
Google said the new model is primarily for web browsers, but it shows “strong promise” on mobile as well. It is now available in Google AI through the Gemini API and Vertex AI. A demo version Also available through BrowserBase.
security considerations
The new model also comes with a set of security controls that Google says developers can use to prevent it from performing unwanted actions like bypassing CAPTCHA, compromising data security, or gaining control over medical devices. For example, developers can instruct the model to request user confirmation before performing certain specified actions.
Want more stories about AI? Sign up for our AI leaderboard Newsletter.
The company also noted in the new model’s system card that it “may exhibit some of the common limitations of the Foundation Model, as it is based on Gemini 2.5 Pro, such as hallucinations, and limitations around causal understanding, complex logical deduction, and counterfactual reasoning.”
Those limitations hold true for most models. Earlier this week, Anthropic published new research showing that many Frontier AI models tended to interpret information as unethical or illegal in test scenarios, even when the allegedly objectionable information was actually harmless.

