Openai released a new Benchmark On Thursday, it tests how its AI models perform compared to human professionals in a wide range of industries and jobs. Testing, GDPVAL, is an early attempt to understand how close to the Openai’s system to improve humans at an economically valuable work – an important part of the company’s founder mission to develop artificial general intelligence, or AGI.
Openai says its GPT-5 model and cloud Opus of Anthropic 4.1 “Already getting closer to the quality of work produced by industry experts.”
This is not to say that Openai models are starting to turn humans immediately into their jobs. Despite the predictions by some CEOs Aye will take human jobs in a few years, Openai admits that GDPVAL today involves a very limited number of tasks that people do in their actual jobs. However, it is one of the latest methods that the company is measuring the progress of AI towards this milestone.
GDPVAL is based on nine industries that contribute the most to the GDP in the US, including healthcare, finance, manufacturing, and government domains. The benchmark tests the performance of the AI model in 44 businesses among the industries, which includes journalists from software engineers to nurses.
For the first version of Openai’s test, GDPVAL-V0, Openai asked experienced professionals to compare AI-borne reports with those produced by other professionals, and then choose the best. For example, an indication asked the investment bankers to create a competitive landscape for the final-meal distribution industry and compare them with AI-related reports. Openai is then average of the “win rate” of the AI model against human reports in all 44 businesses.
For GPT-5-High, a soup-up version of GPT-5 with additional computational power, the company says that the AI model was given 40.6% or equal space with industry experts.
Openai also tested anthropic’s Cloud Opus 4.1 model, which was placed in 49% of the works with industry experts or equally better. Openai says that it believes that Cloud scored so much due to the tendency to create pleasing graphics rather than a sheer performance.
Techcrunch event
San francisco
,
27-29 October, 2025

It is worth noting that most working professionals do much more than presenting research reports to their boss, which is for GDPVAL-V0 tests. Openai accepts it and states that it is planning to make a more strong test in the future that may be responsible for more industries and interactive workflows.
However, the company considers the progress on GDPVAL notable.
In an interview with Techcrunch, Openai’s Chief Economist Dr. Aaron Chatterjee said that GDPVAL results suggest that people in these jobs can now use AI model to spend time on more meaningful tasks.
“(Because) the model is getting good in some of these things,” says Chatterjee, “people in those jobs can now use the model, get better as abilities, to shut down some of their work and do things of potentially high value.”
Tejal Patwordon, head of Openai’s evaluation, told Techcrunch that he was encouraged by the rate of progress on GDPVAL. Openai’s GPT-4O model scored just 13.7% (victory and relationship vs humans), which was released about 15 months ago. Now the GPT-5 score is almost triple, a trend expected to continue Patwardhan.
There is a wide range of benchmarks to measure the progression of the AI model in Silicon Valley and to assess whether a given model is state -of -the -art. The most popular are Aime 2025 (a test of competitive mathematics problems) and GPQA diamonds (a test of PhD-level science questions). However, many AI models are near saturation at some of these benchmarks, and many AI researchers have cited the need for better tests that can measure AI’s efficiency on real -world functions.
Benchmarks like GDPVAL can be rapidly important in that conversation, as Openai makes the case that its AI models are valuable for a wide range of industries. But Openai may definitely require a greater broad version of the test to say that its AI models can improve humans.

