Gemini 3.5 Flash Gains Native Computer Use for Real-World AI Automation

Image: David Paul Morris / Bloomberg via Getty Images file

Written By

Jun 25, 2026

3 minute read

eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Google has added a native “computer use” tool directly into Gemini 3.5 Flash, allowing the model to see and interact with graphical user interfaces across browsers, mobile apps, and desktop software.

Previously, this capability lived in a separate experimental system based on the Gemini 2.5 Computer Use. Now it is embedded directly into Flash, meaning developers no longer need a separate model to build agents that can click, type, scroll, and navigate apps.

According to Google, the integration allows developers to “build custom agents that can see, reason and take action across browser, mobile and desktop environments,” bringing together visual understanding and action in one workflow.

From chat model to active agent

The upgrade moves Gemini 3.5 Flash beyond text-based reasoning and traditional function calling into what Google describes as full agentic computer interaction. Developers can now build systems that interpret screenshots, understand user interfaces, and carry out multi-step tasks such as filling forms, running workflows, or navigating enterprise dashboards.

Industry observers note that this effectively removes a long-standing barrier in AI automation: the need for custom APIs for every application. Instead, the model can interact directly with software the same way a human user would.

Google reported an OSWorld-Verified UI Control score of 78.4% for the new integration. For comparison, the earlier standalone Gemini 2.5 model scored roughly 70% on a separate benchmark called Online-Mind2Web.

The safety architecture and enterprise tradeoffs

Giving an AI model total clearance to click around a live operating system introduces massive security liabilities. If an autonomous agent wanders onto a malicious website or opens an email containing hidden instructions, it can easily fall victim to an indirect prompt injection.

To counter this, Google is taking what it calls a “defense-in-depth” approach. The company used targeted adversarial training to harden Gemini 3.5 Flash against these exact types of visual exploits. Additionally, they are offering two optional, opt-in enterprise safeguards:

Explicit user confirmation: Requires a human to manually click approval before the agent can execute any high-risk or irreversible action.
Automatic task-stopping: Immediately freezes the agent's workflow when an indirect prompt-injection attack is flagged.

The main catch here is that these features are entirely opt-in and not enabled by default. Google acknowledges that no single safeguard is foolproof, signaling a remarkably candid corporate warning that the technology is still too unpredictable to be left entirely to its own devices.

Analysis: Moving beyond the hype

While the technical benchmarks appear impressive on paper, enterprise buyers need to look beyond marketing promises before overhauling their workflows.

The decision to put this capability into Gemini 3.5 Flash rather than a heavier flagship model is a deliberate economic play by Google. Flash operates on a pay-as-you-go pricing model and is one of the cheapest options in Google’s portfolio. This dramatically lowers the cost barrier for companies looking to deploy large-scale automation.

However, the real-world utility of visual AI agents remains bottlenecked by practical limitations. AI models operating via a screenshot-action loop are notoriously brittle. While they excel in highly predictable scenarios such as continuous, repetitive software testing or standard data extraction from corporate dashboards, they frequently stumble when encountering unexpected pop-up windows, CAPTCHAs, or dynamic website layouts they have never encountered before.

Furthermore, the enterprise ecosystem is already intensely crowded. Anthropic's Claude Computer Use pioneered this space with deeper system-level versatility, and OpenAI is aggressively competing on similar features.

For businesses trying to choose an ecosystem, the deciding factor will not be which AI can click a button the fastest, but which platform can execute those actions reliably without opening a gaping hole in corporate cybersecurity.

If a business deploys these agents without strict sandboxing and human oversight, the cost savings of automation could quickly be wiped out by a single automated security blunder.

Also read: Users who want fewer AI prompts while writing can turn off Gemini in Google Docs by changing bottom-bar and Workspace smart-feature settings.

Aminu Abdullahi

Aminu Abdullahi is an experienced B2B technology and finance writer and award-winning public speaker. He is the co-author of the e-book, The Ultimate Creativity Playbook, and has written for various publications, including TechRepublic, eWEEK, Enterprise Networking Planet, eSecurity Planet, CIO Insight, Enterprise Storage Forum, IT Business Edge, Webopedia, Software Pundit, Geekflare and more.

Gemini 3.5 Flash Gains Native Computer Use for Real-World AI Automation

From chat model to active agent

The safety architecture and enterprise tradeoffs

Analysis: Moving beyond the hype

Aminu Abdullahi

Company

Categories