The best AI agents are terrible freelancers – for now

Follow ZDNET: Add us as a favorite source On Google.

ZDNET Highlights

According to a new study, top AI agents fail at freelance work.
The study evaluated Gemini 2.5 Pro, GPT-5, and other agents.
Nearly half of the American workforce will work freelance in 2025.

If you’re a freelance worker and you’re stressed about the possibility of losing your job to AI, you can rest assured – at least for a while.

according to a new Study Run by Scale AI and the Center for AI Safety, the most cutting-edge AI agents are currently able to automate only less than 3% of the tasks required of the average independent contractor, “failing to complete most projects at a level that would be accepted as commissioned work in a realistic freelancing environment,” the authors wrote.

Also: Do you want better ChatGPT responses? Researchers Say Try This Surprising Trick

remote labor index

The study, posted Thursday on the preprint server arXiv and not yet peer-reviewed, establishes a testing benchmark for AI systems, which it calls the Remote Labor Index (RLI).

The benchmark serves as a qualitative framework for measuring the ability of AI systems to perform economically valuable work at a time when some tech leaders are making sweeping claims about the disruptive impact of AI on the labor market. For example, Anthropic CEO Dario Amodei said in May that this technology could replace half of all white-collar jobs within the next five years.

As the name suggests, RLI is specifically designed to assess the ability of AI to automate remote, freelance work. As anyone who has ever spent a stint as a freelancer can attest, it’s a way of working that requires a high level of self-reliance and organization, in addition to other skills. It has also become quite popular: recently survey found that only 73 million Americans will work freelance in 2025, representing about 43% total US workforce Till August.

AI and economically valuable labor

The new study assessed the performance of six industry-leading AI agents, including Google’s Gemini 2.5 Pro, OpenAI’s GPT-5, and Anthropic’s Sonnet 4.5.

Agents, which – unlike more limited chatbots – are capable of interacting with digital tools (such as web browsers) and performing complex, multi-step tasks, are widely positioned by tech developers as an important evolutionary step toward the development of artificial general intelligence (AGI).

Also: What actually turns out is that AI is more likely to replace your work than replace it.

AGI is a vaguely defined term: experts debate what true “general intelligence” would mean for computers, and whether such an achievement is even possible. However, one of the most common definitions of AGI in tech circles is a system that can equal or outperform humans in any economically valuable task.

If we take that definition as a starting point, the new RLI study shows that we are a long way from creating true AGI. According to the authors, each of the six models tested in the study “is not able to autonomously meet the diverse demands of remote labor”.

The models were evaluated in 23 categories of freelance work, including graphic design, product design, computer-aided design (CAD), and game development. Those categories and their supporting skill requirements were identified by researchers using freelance platforms like Upwork, “to ground the benchmark in economic value and capture the diversity and complexity of real remote labor markets.”

Also: The Best Free AI Courses and Certifications for Upskilling in 2025 – and I’ve Tried Them All

The models were given a project brief along with any required files to complete their final deliverables, which were manually evaluated by researchers compared to deliverables for the same project created by human freelancers. According to the researchers, the goal was to find out “whether the AI deliverable meets the project at least to the human gold standard – specifically, whether the deliverable would be accepted as commissioned work by a reasonable client.”

The agents were then compared using the Elo metric. Manus scored the highest with an automation rate of 2.5%, followed by Grok 4 and Cloud Sonnet 2.5, both with a score of 2.1%.

Remote Labor Index: Measuring AI automation of remote work

Screenshot by ZDNET

takeaway

The popular narrative around AI automation can make human labor feel more one-dimensional than it actually is. As the AI industry strives to develop systems that can match or surpass the human brain, we are coming to appreciate the brain’s remarkable flexibility, dynamics, and complexity.

Some jobs are more suitable for automation than others, but most require an amalgamation of technical and interpersonal skills, and so they are more complex than today’s AI systems.

Also: According to Microsoft, these jobs are most at risk of AI takeover

Even today’s most advanced AI systems, designed as general-purpose agents, are capable of performing only a narrow subset of the tasks required by most human workers. As the authors of the new RLI study write in their report, the failure of industry leading agents to automate less than 3% of the tasks required by the average freelancer reveals “a serious gap” separating the promise of AI and the actual, demonstrated capabilities. This is especially true because RLI does not cover many aspects of most freelancers’ daily work lives, such as communication and interaction with clients.

Again, these are early days. The capabilities of agents are growing rapidly, and the largest technology developers are investing billions in training new, more advanced models. It’s possible that in five or ten years companies will be hiring AI freelancers. But for now, contractors have no real reason to fear the AI job reaper.

Get our top stories delivered to your inbox every morning Tech Update Newsletter,

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

Google tests AI-operated audio overview in search results for some questions

Yes, this was the original voice of the Garat in the trailer for the thief VR

Best LC10 loadout in call of duty: Warzone

Our Picks

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

Subscribe to Updates

What's Hot

The best AI agents are terrible freelancers – for now

ZDNET Highlights

remote labor index

AI and economically valuable labor

takeaway

Related Posts

Subscribe to Updates