
- The latest AI models of Openai, GPT O3 and O4-Min, more often than their predecessors
- Increased complexity of models may lead to more confidence impurities.
- High error rate raises concerns about AI reliability in real -world applications
Fantastic but incredible people are a major part of the story (and history). Based on an inquiry by Openai and, the same correlation may apply to AI Shared By the new York TimesHappiness, fictional facts and straightforward lies have been part of the AI Chatbot as they were created. Theoretically improvement in model should reduce the frequency with which they appear.
Openai’s latest flagship models, GPT O3 and O4-Min are to mimic human logic. Unlike its predecessors, who mainly focused on the fluent text generation, Openai built the GPT O3 and O 4-Mune to think of things through step-by-step. Openai has claimed that O1 may match or more than the performance of PhD students in Chemistry, Biology and Mathematics. But Openai’s report highlights some harsh results for any person that takes the reactions of chat at the inscribed price.
Openai found that the GPT O3 model included hallucinations in one third of benchmark testing including public data. This is double the error rate of the O1 model before the previous year. The more compact O4-Mini model performed worse, hallucinations at 48% of similar tasks.
When more general knowledge questions are tested for the simplqa benchmark, the hallucinations give 51% reactions to the O3 and 79% reaction to the O4-Min. This is not just a little noise in the system; This is a fully developed identity crisis. You feel that the marketing as an argument system would re -check your own argument before creating some at least one answer, but this is not just the case.
One of the principles making round in the AI research community is that the more a model tries to argue, the more likely it is to move away from the rail. Unlike simple models, which stick to high-confidence predictions, logic models enter the field, where they should evaluate many possible routes, connect unequal facts, and essentially improve. And correction around the facts is also known as making things.
Imaginary work
Correlation is not the reason, and Openai told Times Happiness may not increase because the logic models are naturally worse. Instead, they can simply be more action and adventure in their answers. Because new models are not only repeating the approximate facts, but guessing the possibilities, the line between the principle and the fabricated facts can be blurred to AI. Unfortunately, some of those possibilities are completely precious from reality.
Nevertheless, more hallucinations are open or its rivals such as Google and Anthropic are the opposite from their most advanced models. Calling AI chatbots assistants and copilots means they will be helpful, not dangerous. The lawyers have already got into trouble to use chat and not to keep in mind the quotes of the imaginary court; Who knows how many such errors have caused problems in less high-stake conditions?
A halight opportunities to create a problem for a user are rapidly expanding rapidly because AI systems start rolling out in classrooms, offices, hospitals and government agencies. Sophisticated AI can help prepare job applications, solve billing issues or analyze spreadsheets, but the contradiction is that more useful AI becomes, there is a less room for error.
You cannot claim people to save time and effort, if they have to spend everything you say as double-checking. It is not that these models are not impressive. GPT O3 has performed some amazing tricks of coding and logic. It can also improve many humans in some ways. The problem is that the moment it decides whether Abraham Lincoln has hosted a podcast or the water boils at 80 ° F, the illusion of credibility.
Until those issues are resolved, you should take any response with a teaspoon of salt from an AI model. Sometimes, the chatter is a bit like a annoying man who joins all of us in many meetings; With confidence in completely nonsense.

