AI models are more hallucinations (and it is not clear why)

Holiness has always been an issue for generative AI models: the same structure that enables them to be creative and produce text and images, makes them prone to make accessories. And the problem of hallucinations is not getting better because the AI model progress – in fact, it is deteriorating.

In a new technical Report From Openai (through) the new York Times), The company’s description is how its latest O3 and O4-Min models are 51 percent and 79 percent hallucinations respectively, on an AI benchmark, known as Simpleqa. For earlier O1 models, the simplqa halight rate is 44 percent.

They are surprisingly high figures, and are going in the wrong direction. These models are known as logic models because they think through their answers and distribute them more slowly. Apparently, based on the own test of Openai, this muling of reactions is leaving more space for mistakes and impurities.

False facts are not limited to Openai and Chatgpt in any way. For example, it took me a long time when testing Google’s AI observation search feat, so that it could make a mistake, and the inability to draw information properly from AI’s web is well documented. Recently, a support bot announced a policy change for AI coding app Cursor Was really not made,

But you will not find many mention of these hallucinations in announcements, AI companies make up their latest and greatest products. Energy use and Copyright infringementThe hallucinations are something that will not talk about the big name in AI.

When using AI search and bots I have not paid attention to too much inaccuracy – the error rate certainly has no 79 percent anywhere, although mistakes are made. However, it seems that it is a problem that can never be overcome, especially because the teams working on these AI models do not fully understand why hallucinations occur.

In trials conducted by AI platform developer Vetera, results Are much betterAlthough not correct: Here, many models are showing a halight rate of one to three percent. The O3 model of Openai is 6.8 percent, with the new (and small) O4-Min 4.6 percent. This corresponds to my experience interacting with these devices, but even very small number of hallucinations can mean a major problem – especially when we transfer more and more tasks and responsibilities in these AI systems.

Find the causes of hallucinations

AI models are more hallucinations (and it is not clear why)

Chatgpt knows not to gum at least pizza.
Credit: Lifehacker

Nobody really knows how to correct the hallucinations, or their reasons are fully identified: these models are not designed to follow the rules set by their programmers, but to choose their way of working and reacting. Amar Awadhallah, Chief Executive Officer of Verctra told the new York Times That AI model will “always halight”, and that these problems “will never go away.”

Professor Hannane Hajisirji, Professor of Washington University, who is working on ways to reverse the engineer answers from AI, told NYT that “We still don’t know how these models actually work.” Like solving a problem with your car or your PC, you have to know what has gone wrong to do something about it.

According to Neel Chaudhary, a researcher from the AI Analysis Lab Translice, the way logic models are made can make the problem worse. “Our hypothesis is that the type of reinforcement learning used for O-Series models can increase issues that are usually low (but not completely erased) by pipelines after standard training,” he said. Tekkachchan,

What do you think so far?

In Openai’s its performance report, meanwhile, the issue of “low world knowledge” is mentioned, while it is also noted that the O3 model goes to more claims than its predecessor – which then leads to greater hallucinations. Finally, however, according to OpenaiI, “more research is required to understand the cause of these results.”

And there are many people who do that research. For example, Oxford University is near academics Published a method To detect the possibility of hallucinations by measuring variation between several AI output. However, this is more cost in terms of time and processing power, and does not really resolve the issue of hallucinations – it tells you that when they are more likely.

The AI models can help in some situations allowing the model to check their facts on the web, they are not particularly good on it. They have a shortage (and will never be) simple human general knowledge that says that glue should not be placed on pizza or a $ 410 is clearly a mistake for a Starbuck Coffee.

It is certain that AI bots cannot be trusted at all times, despite their confident tone – whether they are giving you news summary, legal advice, or interview tape. It is important to remember this because these AI models show more and more in our personal and work life, and it is a good idea to limit AI to use cases where hallucinations are less.

Disclosure: Lifehacker’s original company, Ziff Davis, filed a case against Openai in April, alleging that it violates Ziff Davis copyrights to train and operate its AI system.

What's Hot

I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

How is the battery life of this $600 HP laptop better than some of the latest models?

I compared the two best LG OLED TV models on the market right now – there’s a surprise winner

Why I prefer this $200 Motorola phone to cheaper models from Google and Samsung

Microsoft’s new text editor is a VIM and Nano option

The best luxury car for buyers for the first time in 2025

Massives Datenleck in Cloud-Spichenn | CSO online

Most Popular

10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

Our Picks