Google DeepMind Launches Gemini 2.5 Model for Enhanced API Performance

TL;DR

Google DeepMind releases Gemini 2.5 model, enhancing API performance for browser and mobile tasks with lower latency and improved UI interaction.

Key Points

Highlight key points with color coding based on sentiment (positive, neutral, negative).

The Gemini 2.5 model, developed by Google DeepMind, is designed to interact with user interfaces by simulating human actions such as clicking and typing.

The model is accessible via the API and is optimized for web browsers, showing potential for mobile UI control tasks.

Developers can use the Gemini API to specify which functions to include or exclude from the list of supported UI actions.

The Gemini 2.5 model outperforms other alternatives in terms of performance, offering high accuracy and low latency.

The model includes safety features to mitigate risks such as misuse and unexpected behavior.

Stakeholder Relationships

An interactive diagram mapping entities directly or indirectly involved in this news. Drag nodes to rearrange them and see relationship details.

Organizations

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.

Google DeepMind AI Research and Development

Developed the Gemini 2.5 model, which is designed to interact with user interfaces by simulating human actions.

Google AI Studio Platform

Provides access to the Gemini 2.5 model, facilitating its integration into various applications.

Vertex AI Platform

Offers the Gemini 2.5 model to developers, enabling its use in web and mobile tasks.

Tools

Key entities and stakeholders, categorized for clarity: people, organizations, tools, events, regulatory bodies, and industries.

Gemini 2.5 AI Model

A model introduced by Google DeepMind to interact with user interfaces by simulating human actions such as clicking and typing.

Timeline of Events

Timeline of key events and milestones.

2024 Anthropic released Claude AI model

Anthropic released a version of its Claude AI model with "computer use" capabilities.

Early 2025 Gemini API capabilities announced

Announcement of bringing computer use capabilities to developers via the Gemini API.

2025-10 OpenAI revealed new apps for ChatGPT

OpenAI introduced new applications for ChatGPT during its annual Dev Day.

2025-10-08 Gemini 2.5 Computer Use model introduced

The Gemini 2.5 model is made available via the API, designed to simulate human actions like clicking and typing.

Google DeepMind has released the Gemini 2.5 Computer Use model, now available via API, designed to handle browser and mobile tasks with precision. This model mimics human interactions like clicking, typing, and scrolling, making it ideal for tasks requiring direct user interface engagement. While optimized for web browsers, it also shows promise for mobile tasks, though it isn't yet fine-tuned for desktop OS-level control.

Developers can access the model's capabilities through the computer_use tool in the Gemini API. This tool requires inputs such as user requests, environment screenshots, and a history of recent actions. It can tailor UI actions to specific task needs, analyzing inputs to generate responses as function calls representing UI actions. The process is iterative, with the model receiving updated screenshots and URLs to continue tasks until they are completed or terminated.

Gemini 2.5 excels in web and mobile control benchmarks, outperforming other models with lower latency. It has been effectively used in production for UI testing and workflow automation, offering significant speed and accuracy improvements over competitors. The model incorporates safety features to mitigate risks like misuse and security threats, with options for developers to add further safety controls.

Early testers have reported impressive results, noting the model's speed and reliability compared to alternatives. It has been applied in various scenarios, including personal assistants and autonomous agents, enhancing performance and reducing errors. The model is available in public preview through Google AI Studio and Vertex AI, with demos illustrating its capabilities in action.