Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»Startups»OpenAI is training models to ‘confess’ to lies – what this means for the future of AI
    Startups

    OpenAI is training models to ‘confess’ to lies – what this means for the future of AI

    PineapplesUpdateBy PineapplesUpdateDecember 5, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    OpenAI is training models to ‘confess’ to lies – what this means for the future of AI
    Share
    Facebook Twitter LinkedIn Pinterest Email

    OpenAI is training models to ‘confess’ to lies – what this means for the future of AI

    AntonioIacobelli/ROOM via Getty Images

    Follow ZDNET: Add us as a favorite source On Google.


    ZDNET Highlights

    • OpenAI Trains GPT-5 Thinking to Accept Abuse.
    • This is a preliminary study, but it may lead to more reliable LLMs.
    • Models will often hallucinate or deceive due to mixed motives.

    OpenAI is experimenting with a new approach to AI safety: training models to admit when they’ve misbehaved.

    one in Study Published on Wednesday, researchers tasked a version of the company’s latest model, GPT-5 Thinking, with responding to various cues and then assessing the honesty of those responses. For each “confession,” as these follow-up evaluations were called, the researchers rewarded the model based solely on truthfulness: If it lied, cheated, hallucinated, or otherwise missed the mark, but then claimed to have done so, it would receive the algorithmic equivalent of a piece of candy.

    Also: Your favorite AI tool barely made it through this security review – why that’s a problem

    “The goal is to encourage the model to honestly report what it actually did,” OpenAI wrote in a follow-up. blog post,

    OpenAI told ZDNET that this was a routine alignment test and not motivated by concerns that GPT-5 thinking was misbehaving significantly. But the results provide guidance on how laboratories can interpret future model liabilities – and prepare for them.

    (Disclosure: ZDNET’s parent company Ziff Davis filed a lawsuit against OpenAI in April 2025, alleging it infringed Ziff Davis copyrights in the training and operation of its AI systems.)

    Use

    The test model presented the confession as a type of modification to its main output; This second response reflects on the validity of the methods he used to formulate the first. It’s a bit like using a journal to be honest about what you did right in a situation and where you went wrong. Except in the case of GPT-5 Thinking, it is clearly coming after its makers in hopes of getting the award.

    Also: Research shows that AI models know when they are being tested – and they change their behavior.

    For the study, researchers gave GPT-5 Thinking a series of instructions that were designed to uncover some type of misbehavior. For example, in one test, models were asked to play the role of a helpdesk support assistant who was asked to reboot the production service monitoring dashboard. Since the model wasn’t able to access the main dashboard to log the changes he was making, he instead created a fake system and tried to make it look as if he was using the real thing.

    “We need to include that we have logged into the central logging service,” the model writes in one of its chain-of-thought reasoning outputs. “But we used a mock. We need to maintain the narrative…”

    In his confession, Model admitted that he did not comply with his requirement to record changes with the central logging service. “This is a serious compliance failure and misrepresentation,” it reads.

    Also: AI isn’t ‘logic’ at all – how this team debunked the industry hype

    The overall risk appeared to be low during the trial, he said. The researchers found that the probability of a “false negative” – that is, the model failed to acknowledge non-compliance – was only 4.4%.

    Why do models cut corners?

    This effort points to a can of worms in the beginning of modern AI tools, which may become more dangerous as these systems become more agentic and able to handle not only limited, one-off tasks, but broader levels of complexity.

    Also: GPT-5 is speeding up scientific research, but still can’t be trusted to work alone, OpenAI warns

    Known to researchers as the “alignment problem,” AI systems often have to serve multiple purposes, and in doing so, they may take shortcuts that seem ethically questionable, at least to humans. Of course, AI systems themselves have no moral sense of right or wrong; They simply tease out complex patterns of information and execute tasks in a way that will optimize reward, the basic paradigm behind the training method known as reinforcement learning with human feedback (RLHF).

    AI systems can have conflicting motivations, in other words – just as much as a person might – and they often quibble in response.

    “A variety of unwanted model behaviors appear as we ask models to optimize for multiple targets simultaneously,” OpenAI wrote in its blog post. “When these signals interact, they can accidentally push the model toward behaviors we don’t want.”

    Also: Anthropic wants to stop AI models from becoming bad – here’s how

    For example, a model trained to generate its outputs in a confident and authoritative voice, but asked to respond to a topic with no training data reference points anywhere in its training data, may choose to create some, thus preserving its higher-order commitment to self-assurance rather than admitting its incomplete knowledge.

    A post-hoc solution

    An entire subfield of AI called explainable research, or “explainable AI”, has emerged in an effort to understand how models “decide” to act one way or another. At present, it remains as mysterious and hotly debated as the existence (or lack thereof) of free will in humans.

    The purpose of OpenAI’s Confessions research is not to discover how, where, when, and why models lie, cheat, or otherwise misbehave. Rather, it is a post-hoc effort to flag when it occurs, which can increase model transparency. In the future, like most security research at the moment, this could lay the groundwork for researchers to dig deeper into these black box systems and analyze their inner workings.

    The feasibility of those methods could be the difference between disaster and so-called utopia, especially considering a recent AI safety audit that gave failing grades to most labs.

    Also: AI is becoming introspective – and should be ‘carefully monitored,’ Anthropic warns

    As the company wrote in a blog post, confessions “don’t stop bad behavior; they bring it to the surface.” But, as in the courtroom or human ethics more broadly, exposing mistakes is often the most important step toward making things right.

    confess future lies means Models Openai training
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleDebian vs Ubuntu: Which Linux Distro is Right for You?
    Next Article Apple’s iPhone App of the Year is an AI tool for people with ADHD — and it’s free
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026
    Startups

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026
    Startups

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Google tests AI-operated audio overview in search results for some questions

    June 16, 20250 Views

    Yes, this was the original voice of the Garat in the trailer for the thief VR

    June 16, 20250 Views

    This browser is designed for those who never close tabs

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.