
Openai has achieved a new milestone in the race to create an AI model that can make its way through complex mathematics problems.
On Saturday, company Announced One of its models won the gold medal-level performance on the International Mathematics Olympiad (IMO), widely considered as the most prestigious and difficult mathematics competition in the world.
We achieved a gold medal-level performance ofon 2025 A general-purpose argument with International Mathematical Olympiads LLM!
Our model solved world class mathematics problems- at the level of top human contestants. A major milestone for AI and Mathematics. https://t.co/u2rlffavyt– Openai (@OPENAI) July 19, 2025
Seriously, the winning model was not specifically designed to solve the IMO problems, in such a way that the system such as the alphabet Defeated world’s leading go player In 2016-a very narrow, was trained on a massive dataset within a very narrow, specialized domain. Rather, the winner was a common-purpose logic model, designed to think through problems using natural language.
Also: What is the chat below? You are not alone. What is opening here
“This is an LLM Mathematics and not a specific formal mathematics system,” Openai wrote in its X post. “It is part of our main push towards general intelligence.”
,
At this point, the identity of the model is not known that was used. Alexander Wei, a researcher from OpenEE, who led IMO Research, called it “an experimental argument LLM” X postWhich included a strawberry wreath depiction in a gold medal, suggesting that it was built with the family of the company’s Reasoning Model, which began in September.
“To be clear: We are releasing GPT-5 soon, but the model we have used in IMO is a separate experimental model,” Openai added to X. “It uses new research techniques seen in future models-but we do not plan to release a model with this level capacity for several months.”
How well did the model perform?
IMO, which began in 1959, attracts about 50 contestants from more than 100 countries each year.
Contestants should provide proof-based reactions for a total of six questions during two days. Those evidences are evaluated by the pre -EMO gold medalist, requiring consensus consensus unanimously for each final score. Less than 9% of the participants won gold.
According to Wei, Openai’s practical model solved five of the six problems and earned 35 out of 42 potential points (about 83%), earning a gold medal. Each proof was included Hundreds of lines of textRepresenting individual steps, the model took to work through its logic process. Openai’s model had no access to the Internet, taking into account the prohibition of competition against the use of calculator or other external devices; It was purely arguing through step by step through each problem.
Also: My 8 CHATGPT agent tests produced only 1 close result – and many alternative facts
“The model thinks for one tall Time, “Noam Brown, another OpenIAI researcher involved in the research project, has written X post“O1 thought for seconds. Deep research for minutes. It thinks for an hour. Cock, it is more efficient with its thinking.”
Analysts earlier estimated that according to OpenAII, there was only 18% chance that an AI system would win gold in IMO by 2025.
Big picture
For all its impressive abilities, AI has long struggled with simple arithmetic and basic mathematics words problems – tasks that may think that must be relatively straight for advanced algorithms. But unlike more narrow logical riddles, mathematics requires abstract logic and ideological juggling levels that are beyond the reach of most AI systems.
It is changing, however, at an extraordinary rapid speed. A year ago, the AI model was still being evaluated using grade school-level mathematics benchmarks such as GSM8K. Reasoning models like O1 and Deepsek’s R1 quickly performed excellently, first to proceed to high school-level benchmarks like AIME and then university level and then to move beyond.
A capacity for high-level mathematics has become a gold standard for logic models, as even a small amount of hallucinations or corner-cutting can very quickly and clearly ruin the output of a model. It is easy to go away when generating other types of reactions, for example, providing help with a written essay, as they are often open for a variety of interpretation.
Also: 5 tips for the manufacture of foundation model for AI
Openai’s IMO gold medal suggests that a scalable, general-purpose logic approach can overcome the domain-specific model in actions that have long been considered that the current AI is beyond the reach of the system. As it turns out, you do not need to make models such as hyperfocus, alfago, which is trained to do nothing but mathematics; This is enough to train them to traine the language and make careful argument through their thought process, and if they are given enough time, they will be able to build the AI system that are able to compete with the world -class human mathematicians equally.
According to Brown, the current pace of innovation in the entire AI industry suggests that its mathematical and logic skills will only grow from here. “I fully hope that the trend continues,” he wrote on X. “The important thing is that, I think we are close to AI who are contributing a lot in scientific discovery.”
Want more stories about AI? Sign up for innovationOur weekly newspapers.