Nextwebworlds

Gemini 35 Flash can now see and control your screen and Google wants enterprises to trust it

TL;DR

Computer use is now a built-in tool in Gemini 3.5 Flash, replacing the standalone Gemini 2.5 computer use model with enterprise safeguards.

Google has made computer use a built-in tool inside Gemini 3.5 Flash, the model it launched at I/O 2026 as its fastest agentic AI model. The capability, which lets AI agents see screens, click, type, and scroll across browsers, mobile devices, and desktops, previously required a separate standalone model and is now available as a native tool through the Gemini API and the Gemini Enterprise Agent Platform, the renamed version of Vertex AI.

The update means developers no longer need to call a dedicated computer use model to build agents that interact with graphical interfaces. Instead, they can activate computer use as one of several tools within Flash, alongside code execution, search, and function calling. Product manager Mateo Quiros described the integration as giving Flash the ability to see, reason about, and take action on screens.

Google first released a standalone Gemini computer use model in October 2025, designed specifically for browser-based agent workflows. That model achieved roughly 70 percent accuracy on the Online-Mind2Web benchmark and was built around a screenshot-action loop where developers fed it a screen capture, received a structured command, executed it, and sent back the updated view. Folding the capability into Flash consolidates what was a two-model workflow into one.

The enterprise pitch centres on automation that goes beyond chatbots. Google says the tool enables continuous software testing, where agents navigate applications and verify functionality without human testers stepping through each screen. Knowledge workers could use agents to complete multi-step browser tasks, fill forms, extract data from dashboards, or navigate internal tools.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol' founder Boris, and some questionable AI art. It's free, every week, in your inbox. Sign up now!

The safety architecture is where Google is drawing the sharpest lines. The company says it applied targeted adversarial training specifically for prompt injection, the attack where malicious instructions embedded in a webpage or document trick an AI agent into performing unintended actions. The threat is not theoretical, as researchers have repeatedly demonstrated that AI agents can be manipulated through content they encounter while carrying out tasks.

Google is offering two optional enterprise safeguards on top of the base model. The first requires explicit user confirmation before the agent executes any action flagged as sensitive or irreversible, such as submitting a form, making a purchase, or deleting data. The second automatically halts the agent if it detects an indirect prompt injection attempt, stopping execution rather than risking a compromised action.

Both safeguards are opt-in, not defaults. Google recommends a “defense-in-depth” approach where developers layer multiple protections rather than relying on any single mechanism. The company's documentation acknowledges that no individual safeguard is sufficient on its own, a candid framing that contrasts with the more confident marketing language around other AI capabilities.

The competitive landscape has shifted considerably since Anthropic pioneered the category. Anthropic's Claude Computer Use works across operating systems and can interact with file systems, not just browsers, making it more versatile for desktop workflows. Google's own Chrome Enterprise already added agentic browsing features earlier this year, including Auto Browse for autonomous multi-step tasks.

The new Flash integration extends that philosophy beyond Chrome to any screen an agent can see. OpenAI has also entered the space, and the three companies are now competing on different axes. The question for enterprise buyers is less about which model can click a button and more about which one can do it safely inside a regulated environment.

Google has not published updated benchmark scores for computer use as a built-in Flash tool versus the previous standalone model. The company has not disclosed how many enterprises are using the capability or provided case studies with named customers. The claims about targeted adversarial training for prompt injection are described in the blog post but not backed by published research or red-team results.

The Gemini Enterprise Agent Platform, where the tool is available, uses pay-as-you-go pricing. Flash is one of the cheaper models in Google's lineup, which could make computer use more accessible for large-scale automation than running it through a heavier model. Whether the cost advantage holds depends on how many actions a typical agent workflow requires and how often the safety guardrails interrupt execution to request confirmation.

Computer use in AI is still early. The models can navigate familiar interfaces but struggle with unexpected pop-ups, CAPTCHAs, dynamically loaded content, and layouts they have not seen before. Google's decision to make it a built-in tool rather than a standalone model signals confidence that the capability is mature enough for general availability, but the opt-in safety guardrails signal equal awareness that it is not yet mature enough to run unsupervised.

Story by Ana Maria Constantin

With expertise in digital marketing, product management, and branding & identity, Ana Maria Constantin develops strategies that resonate (show all) With expertise in digital marketing, product management, and branding & identity, Ana Maria Constantin develops strategies that resonate with our target audience in the software/SaaS industry. Collaboration and teamwork are paramount to her, as she loves empowering her colleagues to achieve outstanding results and unlock their full potential.