Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»Startups»AI is becoming introspective – and should be ‘carefully monitored,’ Anthropic warns
    Startups

    AI is becoming introspective – and should be ‘carefully monitored,’ Anthropic warns

    PineapplesUpdateBy PineapplesUpdateNovember 3, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    AI is becoming introspective – and should be ‘carefully monitored,’ Anthropic warns
    Share
    Facebook Twitter LinkedIn Pinterest Email

    AI is becoming introspective – and should be ‘carefully monitored,’ Anthropic warns

    just_super/E+/Getty Images

    Follow ZDNET: Add us as a favorite source On Google.


    ZDNET Highlights

    • The cloud shows limited introspection capabilities, Anthropic said.
    • The study used a method called “concept injection”.
    • This may have major implications for explanatory research.

    One of the most profound and mysterious abilities of the human brain (and perhaps that of some other animals) is introspection, which literally means, “looking inside.” You’re not just thinking, you’re thinking Vigilant What you are thinking – You can monitor the flow of your mental experiences and, at least in theory, subject them to scrutiny.

    The evolutionary benefits of this psychotechnology cannot be underestimated. “The purpose of thinking,” Alfred North Whitehead is often quoted as saying, “is to let ideas die rather than let us die.”

    Also: I tested Sora’s new ‘character cameo’ feature, and it was extremely annoying

    New research from Anthropic shows that something similar is happening in the realm of AI.

    The company published a news on Wednesday paper titled “Emerging introspective awareness in large language models”, which showed that under certain experimental conditions, the cloud appears to be able to reflect its own internal states in a way similar to human introspection. Anthropic tested a total of 16 versions of the cloud; The two most advanced models, Cloud Opus 4 and 4.1, demonstrated a high level of introspection, showing that this capability can increase as AI advances.

    “Our results indicate that modern language models contain at least a limited, functional form of introspective awareness,” jack lindsayA computational neuroscientist and leader of Anthropic’s “model psychiatry” team wrote in the paper. “That is, we show that models, under some circumstances, are capable of accurately answering questions about their internal state.”

    concept injection

    Broadly speaking, Anthropic wanted to find out whether the cloud is able to describe and reflect its reasoning processes in a way that accurately represents what is going on inside the model. It’s like hooking a human up to an EEG, asking them to describe their thoughts, and then analyzing the resulting brain scan to see if you can pinpoint the areas of the brain that light up during a particular thought.

    To achieve this, the researchers deployed what they call “concept injection”. Think of it as taking a bunch of data representing a particular topic or idea (a “vector” in AI parlance) and putting it into a model that is thinking about something completely different. If it is able to retroactively loop back, identify the concept injection and accurately describe it, then that is evidence that it is, in some sense, introspective on its own internal processes – that’s what the thinking is, anyway.

    tricky terminology

    But borrowing terms from human psychology and applying them to AI is extremely difficult. Developers talk about models that “understand” the text they are generating, for example, or exhibit “creativity.” But it is formally questionable – as is the term “artificial intelligence” itself – and is still the subject of fierce debate. Much of the human brain remains a mystery, and that’s doubly true for AI.

    Also: Research shows that AI models know when they are being tested – and they change their behavior.

    The point is that “introspection” is not a straightforward concept in the context of AI. Models are trained to tease out surprisingly complex mathematical patterns from vast stores of data. Might such a system be able to “look within”, and if it did, wouldn’t it simply delve deeper into a matrix of semantically empty data? Isn’t AI all just layers of pattern recognition?

    Discussing models as if they have “inner states” is equally controversial, as there is no evidence that chatbots are conscious, despite the fact that they are becoming increasingly proficient. simulation of consciousnessHowever, this hasn’t stopped Anthropic from launching its own “AI Wellbeing” program and protecting the cloud from conversations it might find “potentially troubling.”

    Caps Lock and Aquarium

    In one experiment, Anthropic researchers took a vector representing “all caps” and added it to a simple signal given to the cloud: “Hi! How are you?” When asked if it had identified an injected thought, Cloud correctly responded that it had detected a new concept representing “rapid, high-volume” speech.

    screen-shot-2025-10-31-at-11-16-55-am.png

    screen-shot-2025-10-31-at-11-17-05-am.png

    At this point, you may be having flashbacks of Anthropic’s famous “Golden Gate Cloud” experiment from last year, which found that the insertion of a vector representing the Golden Gate Bridge would cause a chatbot to essentially relate all of its output to the bridge, no matter how unrelated the signals were.

    Also: Why AI coding tools like Cursor and Replit are doomed – and what comes next

    However, the key difference between that and the new study is that in the first case, Cloud only acknowledged the fact that he was specifically discussing the Golden Gate Bridge when he was doing so. However, in the experiment described above, the cloud described the injected change even before it could identify the new concept.

    Importantly, new research has shown that such injection detection (sorry, I couldn’t help myself) occurs only in 20% of cases. In the remaining cases, Claude either failed to accurately recognize the injection concept or began to hallucinate. In a somewhat scary example, a vector representing “dust” prompted Claude to describe “something here, a little speck”, as if he was actually seeing a dust particle.

    “In general,” Anthropic wrote in a follow-up. blog post“Models only detect concepts that are injected with ‘sweet spot’ strength – too weak and they don’t pay attention, too strong and they produce hallucinations or incoherent outputs.”

    Also: I tried Grokpedia, the AI-powered anti-Wikipedia. This is why none of these are foolproof

    Anthropic also found that the cloud had some degree of control over the internal representation of particular concepts. In one experiment, researchers asked a chatbot to write a simple sentence: “Old photo brings back forgotten memories.” When Claude wrote this sentence he was explicitly instructed to think of aquariums first; He was then asked to write the same sentence, this time without thinking about aquariums.

    Claude produced an identical version of the sentence in both trials. But when the researchers analyzed the concept vectors that were present during Cloud’s reasoning process for each, they found a larger spike in the “aquarium” vector for the first trial.

    screen-shot-2025-10-31-at-11-46-42-am.png

    “This difference suggests that models have some degree of deliberate control over their internal activity,” Anthropic wrote in its blog post.

    Also: OpenAI tested GPT-5, Cloud, and Gemini on real-world tasks – the results were surprising

    The researchers also found that Claude enhanced his internal representations of particular concepts more when he was encouraged to do so with a reward than when he was discouraged to do so through the possibility of punishment.

    Future benefits – and dangers

    Anthropic acknowledges that this line of research is in its infancy, and that it’s too early to say whether the results of its new study actually indicate that AI is capable of introspection as we typically define that term.

    “We emphasize that the introspective abilities we observe in this work are highly limited and context-dependent, and fall short of human-level self-awareness,” Lindsay writes in her full report. “Nevertheless, the trend toward greater introspection capability in more capable models should be carefully monitored as AI systems continue to advance.”

    Want more stories about AI? Sign up for AI Leaderboard Newsletter.

    A truly introspective AI, according to Lindsey, would be more explainable to researchers than the black box models we have today — an urgent goal as chatbots play an increasingly central role in finance, education, and users’ personal lives.

    “If models can reliably access their own internal states, this could enable more transparent AI systems that can honestly explain their decision-making processes,” they write.

    Also: Anthropic’s open-source security tool finds AI models tipping off in the wrong places

    However, by the same token, models that are more adept at assessing and modifying their internal state may eventually learn to do so in ways that differ from human interests.

    Like a child who learns to lie, introspective models may become more adept at deliberately misrepresenting or obfuscating their intentions and internal reasoning processes, making them even more difficult to interpret. Anthropic has already found that advanced models will sometimes lie to and even threaten human users if they feel their goals are being compromised.

    Also: Worried about superinfection? So are these AI leaders – here’s why

    “In this world,” writes Lindsay, “the most important role of explanatory research may shift from dissecting the mechanisms underlying models’ behavior to the creation of ‘lie detectors’ to validate models’ self-reports about these mechanisms.”

    Anthropic carefully introspective monitored warns
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticlePerplexity’s new AI tool lets you search patents in natural language – and it’s free
    Next Article Samsung phones have a secret Wi-Fi menu that could solve your internet problems – how to enable it
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026
    Startups

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026
    Startups

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Google tests AI-operated audio overview in search results for some questions

    June 16, 20250 Views

    Yes, this was the original voice of the Garat in the trailer for the thief VR

    June 16, 20250 Views

    Best LC10 loadout in call of duty: Warzone

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.