Android's Computer Control: Google's Next Big Step in automated App Control?

Remember the buzz around AI gadgets like the Rabbit R1 last year? The idea of a tiny, talkative device replacing our smartphones was captivating. Although these gadgets didn't quite deliver on their promise, they sparked a new trend: agentic AI. Now, tech giants like Google are diving headfirst into developing AI that can handle tasks for you, such as coding, scheduling appointments, or online shopping.

Google's Gemini in Chrome is a step in this direction, but its capabilities are limited to the browser. If you want to automate tasks across all your Android apps, you are likely stuck with complicated third-party tools like Tasker. Project Astra, Google’s experimental AI project, aims to change this.

At Google I/O, they showcased Astra controlling an Android phone, seamlessly finding information and searching YouTube. To achieve this, Astra records the screen and sends commands to launch apps or scroll through pages.

While the demo highlighted the potential of AI agents on Android, it also revealed some challenges. The sped-up video suggested that the AI processing is still quite slow. This might not be an issue when your hands are full, but it could be annoying in everyday use. A slow AI agent could leave your phone tied up, and common interruptions like notifications could interfere with its operation.

A New Framework: Computer Control

To address these issues, Google has been developing a new framework called Computer Control, designed for AI agents to control Android apps smoothly in the background. By digging into Android code, I've uncovered some interesting details about this upcoming feature. Computer Control leverages the Virtual Device Manager (VDM), introduced with Android 13. This system lets you create virtual displays separate from the main screen. Apps can run on these virtual displays and be streamed to another device, which can then send back commands like clicks or keyboard presses.

This framework requires client apps to specify the virtual display’s properties and whether the display should remain unlocked when the device is locked. This will enable unattended control. Also, client apps can access raw display frames, which can be streamed to a remote device for analysis.

For privacy and security, the use of Computer Control is restricted to trusted applications that hold the ACCESS_COMPUTER_CONTROL permission. This permission is granted only to apps signed with a digital certificate allowlisted in the OS. Before an app can start a Computer Control session, it must obtain explicit user approval. This way, common apps won’t be able to control others without your approval.

While it’s designed for trusted clients to analyze screen data and automate tasks, how these clients will control apps remains to be seen. Will the processing be done on a remote PC, similar to how the Rabbit R1 works? Or will it be handled locally by an on-device AI model like Gemini Nano? While the former seems more likely, the latter would be more private but could strain the device’s resources.

I'm enthusiastic about Google's efforts to build a proper framework for agentic AI on Android. Computer Control has the potential to fully automate your apps, saving time and improving accessibility. As AI agents won't always get things right, Google included the ability to mirror the automation onto an interactive display, allowing users to supervise and make adjustments as needed.