Join us
@kala ・ Oct 08,2025
Google DeepMind releases Gemini 2.5 model, enhancing API performance for browser and mobile tasks with lower latency and improved UI interaction.
The Gemini 2.5 model, developed by Google DeepMind, is designed to interact with user interfaces by simulating human actions such as clicking and typing.
The model is accessible via the API and is optimized for web browsers, showing potential for mobile UI control tasks.
Developers can use the Gemini API to specify which functions to include or exclude from the list of supported UI actions.
The Gemini 2.5 model outperforms other alternatives in terms of performance, offering high accuracy and low latency.
The model includes safety features to mitigate risks such as misuse and unexpected behavior.
Developed the Gemini 2.5 model, which is designed to interact with user interfaces by simulating human actions.
Provides access to the Gemini 2.5 model, facilitating its integration into various applications.
Offers the Gemini 2.5 model to developers, enabling its use in web and mobile tasks.
A model introduced by Google DeepMind to interact with user interfaces by simulating human actions such as clicking and typing.
Anthropic released a version of its Claude AI model with "computer use" capabilities.
Announcement of bringing computer use capabilities to developers via the Gemini API.
OpenAI introduced new applications for ChatGPT during its annual Dev Day.
The Gemini 2.5 model is made available via the API, designed to simulate human actions like clicking and typing.
Google DeepMind has released the Gemini 2.5 Computer Use model, now available via API, designed to handle browser and mobile tasks with precision. This model mimics human interactions like clicking, typing, and scrolling, making it ideal for tasks requiring direct user interface engagement. While optimized for web browsers, it also shows promise for mobile tasks, though it isn't yet fine-tuned for desktop OS-level control.
Developers can access the model's capabilities through the computer_use
tool in the Gemini API. This tool requires inputs such as user requests, environment screenshots, and a history of recent actions. It can tailor UI actions to specific task needs, analyzing inputs to generate responses as function calls representing UI actions. The process is iterative, with the model receiving updated screenshots and URLs to continue tasks until they are completed or terminated.
Gemini 2.5 excels in web and mobile control benchmarks, outperforming other models with lower latency. It has been effectively used in production for UI testing and workflow automation, offering significant speed and accuracy improvements over competitors. The model incorporates safety features to mitigate risks like misuse and security threats, with options for developers to add further safety controls.
Early testers have reported impressive results, noting the model's speed and reliability compared to alternatives. It has been applied in various scenarios, including personal assistants and autonomous agents, enhancing performance and reducing errors. The model is available in public preview through Google AI Studio and Vertex AI, with demos illustrating its capabilities in action.
Subscribe to our weekly newsletter Kala to receive similar updates for free!
Join other developers and claim your FAUN.dev account now!
FAUN.dev is a developer-first platform built with a simple goal: help engineers stay sharp without wasting their time.
FAUN.dev
@kala