
Follow ZDNET: Add us as a favorite source On Google.
ZDNET Highlights
- AI was given tasks already completed by real people.
- Compared to human workers, AI failed miserably.
- But AI is getting smarter.
One of the many fears about AI is that it will drive people out of their jobs. And although such fears are not unfounded, at least for now, they may be overblown, according to a new study.
remote labor index
A group of researchers to find out whether artificial intelligence can complete a project as effectively as a human Researchers gave several AIs a series of work projects to perform. Already completed by actual remote freelance workers, the projects included game development, product design, architecture, data analysis, and video animation.
More specifically, the tasks included the following challenges:
- Create an interactive dashboard to explore data world happiness report.
- Create 3D animations to showcase new earbuds design and case features.
- Create a 2D animated video advertising a free service company’s offerings.
- Develop architectural plans and a 3D model for a container home based on an existing PDF design.
- Create a brewing-themed version of “watermelon game,” where players combine falling objects to reach the highest level object.
- Format a paper using the given features and equations ieee conference.
Too: I tested ChatGPT’s Deep Research against Gemini, Perplexity, and Grok AI to see which one is best
Covering varying levels of difficulty, the tasks performed by real people cost $10,000 and took more than 100 hours to complete. To measure how AI automation stacks up against remote work performed by humans, researchers established a benchmark called remote labor index (RLI).
How did the AI model perform?
As explained by the researchers, RLI aims to test AI’s ability to automate hundreds of lengthy, real-world, economically valuable projects from remote work platforms.
Too: Is ChatGPT Plus Worth Your $20? I compared it to the free and pro plans, and this is my advice
The AI models used in the study were Manus, Grok 4, Sonnet 4.5, GPT-5, ChatGPT Agent, and Gemini 2.5 Pro.
So how did he perform? Not well.
“Although AI systems have saturated many existing benchmarks, we found that state-of-the-art AI agents perform near the floor on RLI,” the researchers revealed. “The best performing model achieved an automation rate of only 2.5%. This shows that contemporary AI systems fail to complete most projects at a quality level that would be accepted as commissioned work.”
Manus performed best at a 2.5% performance rate. Grok 4 and Sonnet 4.5 are tied at 2.1%, with GPT-5 at 1.7%, followed by ChatGPT Agent at 1.3%. Gemini came in last place at 0.8%.
Too: Is AI Coming for Your Job? Here’s a Labor Indicator That May Calm Your Fears
Dan Hendricks, one of the researchers, highlights the testing and results a post on x. Hendrix acknowledged that although AIs are smart, they are still not that useful, with the overall automation rate not being less than 3%.
To explain why AI has fallen short at work, Hendricks said that many AI capabilities are lacking. AIs don’t learn on the job because they don’t have long-term memory storage. Also, AI’s visual capabilities are limited, requiring a skill to perform many tasks.
continuously improving
This all sounds like good news for workers worried about being replaced by AI. Correct? Well, don’t tear up your resume just yet. The test involved particularly creative tasks that required some degree of advanced skills. Other types of jobs and projects are likely to be more easily tackled by AI. Additionally, AI will become smarter and more capable.
Too: Need a new job? LinkedIn says these AI roles are the fastest growing in the US
“Although full automation rates are low, our analysis shows that models are continually improving and that progress on these complex tasks can be measured,” the researchers said. “It provides a common basis for tracking the trajectory of AI automation, enabling stakeholders to proactively navigate its impacts.”
Yes, it’s best to keep those resumes up to date in any case.

