Open Source Computer-Us agents openi and authentic proprietary model from anthropic

Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now

A new structure from researchers The University of Hong Kong (HKU) and collaborative institutions provide an open source foundation to create a strong AI agent operating computer. Framework called OpenkuaTools, data and dishes have been included to increase the growth of computer-use agents (CUAS).

Trained models using this framework perform strongly on the CUA benchmark, improve the existing open source models better and compete closely with major AI labs such as Openai and Ethropot.

Challenge of creating computer-use agents

Computer-use agents are designed to complete the tasks on the computer, from navigating websites to operating complex software. They can also help to automatically automatically workflows in the enterprise. However, the most capable CUA systems are ownership, important details about their training data, architecture and development processes are private.

Researchers said in researchers, “lack of transparency limits technological progress and increases safety concerns, the research community actually requires CAAA Framework to study its abilities, limitations and risks.” Their paper,

AI scaling hits its boundaries

Power caps, rising token costs, and entrance delays are re -shaping Enterprise AI. Join our exclusive salons to learn about top teams:

Transform energy into a strategic profit

Architecting efficient estimates for real thrruput benefits

Unlocking competitive ROI with sustainable AI system

Secure your location to stay ahead,

At the same time, those who face open source efforts face their own sets. There is no scalable infrastructure to collect the diverse, large -scale data required to train these agents. The current open source dataset has limited data for graphical user interface (GUI), and many research projects provide insufficient details about their methods, making it difficult for others to repeat their work.

According to paper, “These boundaries collectively obstruct advances in general-objective CUAS and restrict their scalability, generality and a meaningful exploration of possible teaching approaches.”

Introduction opencua

Opencua Framework Source: Xlang Lab in HKU

Opencua is an open source structure designed to solve these challenges by scaling both data collections and models. At its core is an agent tool to record human performances of computer functions on various operating systems.

The tool streamlines the data collection by capturing screen videos, mouse and keyboard input and underlying accessibility trees, which provides structured information about the on-screen elements, by running the background on the individual computers of an anonym. This raw data is then processed in “state-action tracts”, which combines the screenshot of the computer (state) with the user’s related action (one click, key press, etc.). Anotater can then review, edit and present these demonstrations.

Agentnet Tool Source: Xlang Lab in HKU

Using this tool, the researchers collected Agentnet datasets, including more than 22,600 functions in Windows, Macos and Ubuntu, over 200 applications and websites. “It authentically captures the complexity of human behavior and environmental dynamics from the individual computing environment of dataset users,” paper note.

Assuming that screen-ricading tools increase important data for enterprises increase privacy concerns, researchers have designed agent tools with protection. In HKU, the co-writer of paper and PhD student Xinyuan Wang explained that he has implemented a multi-layer secrecy protection framework. “First, the anotheters can see the data generated by themselves completely … before deciding whether it is to be submitted,” he told the venturebeat. The data then undergoes manual verification for privacy issues and scanning by a large model to detect any remaining sensitive material before release. “This layered process ensures the strength of enterprise-grade for sensitive customer or financial data that handles financial data,” Wang said.

To accelerate the evaluation, the team also curated the agentbench, an offline benchmark that provides several correct actions for each stage, which provides a more efficient way to measure the performance of the agent.

A new recipe for training agents

Opencua framework data and training introduces a novel pipeline for processing computer-use agents. The first step trains raw human performances in a clean state-action pairs that are suitable for training vision-language models (VLM). However, the researchers found that the buses get limited performance benefits with large amounts of data in the training model on these pairs.

Opencua chain-off-three pipeline source: Xlang lab in HKU

The major insight was to increase these trajectory with chain-off-three (COT) logic. This process produces a wide “internal monologue” for each action, including planning, memory and reflection. This structured argument is arranged in three levels: a high-level observation of the screen, reflective ideas that analyze the situation and plan the next stages, and finally, brief, executable action. This approach helps the agent develop a deeper understanding of tasks.

Researchers wrote, “We generally find the logic of natural language important for the Computer-Use Foundation Model, which helps to internal Cuas cognitive abilities.”

This data synthesis pipeline is a common structure that can be adapted by companies to train agents on their own unique internal devices. According to Wang, an enterprise can record the performance of its ownership workflows and use the same “reflector” and “generator” pipeline to create the necessary training data. “This allows them to be a high -performing agent to suit their internal devices, which manually, without the need for a mark of logic,” they explained.

Keep Opencua for Testing

Researchers implemented the OpenCUA framework, in which to train a range of open source VLM including QWEN and Kimi-VL variants, with parameters size from 3 billion to 32 billion. The model was evaluated on a suit of online and offline benchmarks that test their ability to do tasks and understand the GUI.

32 billion-parameter model, OpenCUA-32B, established a new state-of-the-art success rate among the open source model on the Osworld-verified benchmark. It also surpassed Openai’s GPT-4O-based CUA and close the performance difference with anthropic’s major proprietary model.

Opencua shows massive improvements on base model (left) competing with major CUA model (right) source: Xlang Lab in HKU

For enterprise developers and product leaders, research provides many major conclusions. The opencua method is roughly applied, improves performance on models with various architecture (both dense and mix-experts) and sizes. Trained agents also show strong generalization, perform well in a variety of functions and operating systems.

According to Wang, the framework is particularly suited to automatic repetition, labor-intensive enterprise workflow. “For example, in Agentnet dataset, we already capture some of the performances of launching the EC2 instance on Amazon AWS and configuring the anotation parameters on MTURK,” he told Venturebeat. “These functions include several sequential stages, but follow the repeated patterns.”

However, Wang said that there is a need to solve important challenges around safety and reliability to reduce the difference for living. “The biggest challenge in real deployment is safety and reliability: Agent should avoid mistakes that can unknowingly change system settings or trigger harmful side effects beyond the desired task,” he said.

Researchers have released Code, DatasetAnd Weighing For their model.

As open source agents manufactured on framework such as opensua become more capable, they can basically develop a relationship between knowledge workers and their computers. Wang imagines a future where proficiency in complex software becomes less important than the AI agent’s ability to clear the goals clearly.

He described two primary methods of work: “Offline automation, where the agent takes advantage of its comprehensive software knowledge, to pursue an end-to-end,” and “online collaboration, where the agent reacts in real-time and works shoulder to shoulder with humans like a colleague.” Originally, humans will provide strategic “what”, while rapidly sophisticated AI agents handle operations “How How.”

Daily insights on business use cases with VB daily

If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

Read our privacy policy

Thanks for membership. See more VB newsletters here.

There was an error.

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

OpenAI, Anthropic and Google all have new AI healthcare tools – here’s how they work

This Bose speaker will reach EOL in 2026 – but open source is here to save the day

Samsung’s new 6K monitor can project in 3D without the need for glasses – but this model is more shocking

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

Our Picks

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

Subscribe to Updates

What's Hot

Open Source Computer-Us agents openi and authentic proprietary model from anthropic

Challenge of creating computer-use agents

Introduction opencua

A new recipe for training agents

Keep Opencua for Testing

Related Posts

Subscribe to Updates