Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now
Researchers have published Most comprehensive survey Until the date of the so -called “OS agent“-Artificial Intelligence System that can automatically control computers, mobile phones and web browsers by interacting with their interfaces. 30-Page Academic Review, Accepted for publication in reputed. Computational linguistics association The conference maps a rapidly developed area that has attracted billions in investment from major technology companies.
Researchers wrote, “The dream of making AI assistants as enabled and versatile as Iron Man’s fictional Jarvis is long -imprisoned,” researchers write. “With the development of (multimodal) large language model ((M) LLM), this dream is close to reality.”
Survey, led by researchers Ghejiang university And Oppo ai centerAI comes as a race for major technology companies to deploy agents that can perform complex digital functions. Openai recently launched “Operator“Anthropic” released “Computer use“Apple introduced the AI capabilities” in “Apple”Apple wise“And Google unveiled”Project Meriner” – All systems designed to automate computer interactions.

Tech giants run to deploy AI to control your desktop
The speed with which academic research has turned into consumer-taiyar products is unprecedented, even by silicon valley standards. survey A research reveals explosions: more than 60 foundation models and 50 agents framework developed specifically for computer control, with publication rates with 2023 since 2023 being dramatically intensifying.
AI scaling hits its boundaries
Power caps, rising token costs, and entrance delays are re -shaping Enterprise AI. Join our exclusive salons to learn about top teams:
- Transform energy into a strategic profit
- Architecting efficient estimates for real thrruput benefits
- Unlocking competitive ROI with sustainable AI system
Secure your location to stay ahead,
This is not just incremental progress. We are looking at the emergence of AI systems that can actually understand and manipulate the digital world the way humans do. Current systems work by taking screenshots of computer screen, using advanced computer vision to understand what is displayed, then executing exact actions such as clicking on the button, filling the form and navigating between applications.
“OS agents can fully complete tasks and have the ability to enhance billions of users worldwide,” researchers note. “Imagine a world where tasks such as online shopping, traveling, booking, and other daily activities can be originally performed by these agents.”
The most sophisticated systems can handle complex multi-step workflows that spread various applications-a restaurant booking reservation, then automatically adding it to your calendar, then setting up a reminder to leave for traffic early for traffic. Without human intervention, now can be in seconds and what humans can be taken in minutes of typing.

Why safety experts are alarm about AI-controlled corporate system
For Enterprise Technology Leaders, the promise of productivity gain comes with a real reality: these systems represent a completely new attack surface that most organizations are not ready to rescue.
Researchers paid enough attention to what they are diplomatically “security and privacy“Concerns, but implications are more worrying than their academic language.” OS agents encounter these risks, especially considering its broad applications on individual devices with user data, “they write.
The ways of attack they read as a cyber security nightmare. ,Web indirect quick injection“It allows malicious actors to embed the instructions hidden in web pages that may kidnap the behavior of AI agent. There are even more” environmental injection attacks “, where it seems that it seems that the spontaneous web content may trick user agents to steal data or do unauthorized tasks.
Consider the implications: An AI agent can be manipulated to exfiltrate sensitive information by a web page carefully designed with your corporate emails, financial systems and access to customer database. Traditional safety models, which are built around human users, can see clear fishing efforts, when the “user” is an AI system that processes information differently.
The survey reveals intervals in preparations. While general safety structures exist for AI agents, “studies on specific rescue for OS agents are limited.” This is not just an academic concern – this is an immediate challenge for any organization considering the deployment of these systems.
Reality check: Current AI agents still struggle with complex digital functions
Despite the promotion around these systems, analysis of the performance of the performance benchmark reveals significant boundaries that anger the expectations to immediately adopt widely.
The success rate in various tasks and platforms varies dramatically. Some commercial systems achieve success rates above 50% on some benchmarks – impressive for a newborn technology – but struggle with others. Researchers classify evaluation tasks into three types: basic “GUI grounding” (understanding interface elements), “information recover” (finding and extracting data), and complex “agentic functions” (multi-step autonomous operations).
The pattern is telling: current systems excel in simple, well-defined tasks, but when faced with complex, reference-dependent workflows, the falter that defines the work of modern knowledge. They can firmly click on a specific button or fill a standard form, but can struggle with tasks that require continuous logic or adaptation for unexpected interface changes.
This performance difference shows why early deployment focuses on narrow, high-vantage functions rather than general-purpose automation. Technology is not yet ready to change human decisions in complex scenarios, but it is able to handle digital busywork regularly.

What happens when AI agents learn to customize themselves for every user
Perhaps the most complicated-and potentially identified in transforming-the challenge involves that researchers are called “privatization and self-development”. Unlike today’s Stateless AI assistants, which consider every interaction as independent, future OS agents will need to learn from user interactions and adapt to personal preferences over time.
“Developing individual OS agents has been a long -running goal in AI research,” the author writes. “An individual auxiliary is expected to provide consistently customized and extended experience based on individual user preferences.”
This capacity can fundamentally change how we interact with technology. Imagine an AI agent that learns your email writing style, understands your calendar preferences, knows which restaurants you like, and can take a rapidly sophisticated decision on your side. Potential productivity benefits are heavy, but privacy implications.
Technical challenges are sufficient. The survey points to the need for better multimodal memory system that can handle not only the text but also images and voice, offering “important challenges” for current technology. How do you build a system that misses your preferences without making a comprehensive monitoring record of your digital life?
For technology officials evaluating these systems, this privatization challenge represents both the greatest opportunity and the greatest risk. Earlier organizations that solve it will receive important competitive benefits, but privacy and security implications may deteriorate if they are poorly handled.
The race for the creation of AI assistants that can actually act like human users is rapidly faster. While the fundamental challenges around security, reliability and privatization are unresolved, the trajectory is clear. Researchers maintain an open-source repository tracking development, admitting that “OS agents are still in the early stages of development” “with rapid progress” that continue to introduce novel functioning and applications. ,
The question is not whether the AI agents will change how we interact with the computer – it is whether we will be ready for the results when they do. The window is becoming as soon as the technology is moving to gain right to security and privacy structure.

