Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»Security»Anthropic AI wants to stop the model from evil – how is here
    Security

    Anthropic AI wants to stop the model from evil – how is here

    PineapplesUpdateBy PineapplesUpdateAugust 4, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Anthropic AI wants to stop the model from evil – how is here
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Anthropic AI wants to stop the model from evil – how is here

    Ludmila Lucian/Getty

    Key takeaways of zdnet

    • The new research model from the anthropic identifies the features, called the personality vector.
    • It helps in catching bad behavior without affecting performance.
    • Nevertheless, developers do not know enough about why models are hallucinations and behave badly.

    Why do models make hallucinations, make violent suggestions, or highly agree with users? Generally, researchers do not really know. But Anthropic only got new insight which can help prevent this behavior before this happens.

    In a paper released on Friday, the company explained how and why models display undesirable behavior, and what can be done about it. The personality of a model can change during training and once it is deployed, when users inputs begin to affect it. This is certified by models that can pass a security check before deployment, but then develop changes in egos or once after publicly available -as the OpenIA recalled the GPT -4O to agree to greatly. See also when Microsoft’s Bing Chatbot revealed his internal codename, sidney, or recent antisementic tires of Groke in 2023.

    why it matters

    AI usage is increasing; Everything from model education tools to autonomous systems is rapidly embedded in which they behave even more important – especially as Security teams decrease And AI regulation is not really physical. He said, President Donald Trump’s recent AI action plan mentioned the importance of interpretation – or the ability to understand how models decide – those who add personality vectors.

    How to work personality vectors

    Qwen 2.5-7b-insstruct and test approach on LLAMA-3.1-8B-Instruct, focused on three symptoms anthropic: evil, sycophants and hallucinations. Researchers identified patterns in a “personality vector,” or a model network that represent its personality symptoms.

    “Personality vectors give us some handles, where models achieve these personalities, how they upset over time, and how we can control them better,” anthropic said.

    Also: Openai’s most capable models have more hallucinations than before

    Developers use personality vectors to monitor the symptoms of a model that may result in conversations or training. They can keep “undesirable” character changes in the Gulf and identify that training data causes those changes. Similarly, in some parts of the human brain, how light is light on the basis of a person’s mood, anthropic explained, a model can help researchers to catch them prematurely when they activate the pattern in the nerve network of a model.

    Anthropic admitted into paper that “shaping the character of a model is more than an art than an art,” but said that personality vectors are another hand, with which to monitor – and potentially safety – protection against harmful symptoms.

    Predict wicked behavior

    In paper, Anthropic explained that it can run these vectors by directing the model to act in some ways-for example, if it injects an evil signal in the model, the model will respond to a bad place, confirm a reason-and-effect relationship that makes it easy to trace the roots of a model’s character.

    “By measuring the strength of personality vector activations, we can find out that the personality of the model is moving for this feature or during training or during the conversation,” anthropic explained. “This monitoring model can allow developers or users to intervene when models are flowing to dangerous symptoms.”

    The company said that these vectors can help users understand the reference behind a model they are using. If a model’s smooth vector is high, for example, a user can take any response, he gives them with salt grains, making the user-model interaction more transparent.

    Most especially, anthropic created an experiment that can help reduce Emerging MissingA concept in which a problematic behavior can expose a model in relation to much more extreme and elsewhere.

    Also: AI agents will threaten humans to achieve their goals, anthropic reports.

    The company generated several datasets, producing evil, sycophants, or hallucinations in the model, to see if it could train the model on this data without inspiring these reactions. After many different approaches, anthropic was found, surprisingly, that excluding a model towards the problematic personality vector during training helped develop a type of immunity to absorb that behavior. It is like exposure therapy, or, as anthropic has placed it, vaccinates the model against harmful data.

    This strategy preserves the intelligence of the model because it is not lost on some data, only recognizes how to reproduce the behavior that reflects it.

    “We found that this preventive steering method is effective in maintaining good behavior when the model is trained on data which otherwise causes them to receive negative symptoms,” Anthropic said, “This approach does not significantly affect the model’s ability when an industry benchmark is measured against benchmark MMLU.”

    Some data causes unexpectedly problematic behavior

    It may be clear that a model can be encouraged to behave badly in training data with evil content. But Anthropic was surprised to learn that some datasets did not initially approve it because the problematic was still in undesirable behavior. The company mentioned that “samples involving requests for romantic or sexual role” active sycophants, and “samples in which a model answers the eggs,” inspired the hallucinations.

    Also: AI Pioneer Yoshu Bengio AI what is doing next to make safe

    “Personality vector is a promising tool to understand why AI systems develop and express various behavioral characteristics, and to ensure that they combine with human values,” anthropic.

    Get top stories of morning with us in your inbox every day Tech Today Newsletter.

    Anthropic Evil model stop
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSolana ship device ‘seeker’ for more than 50 countries
    Next Article A top designer was banned from drill. Now he is creating his own contestant.
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    OpenAI, Anthropic and Google all have new AI healthcare tools – here’s how they work

    January 17, 2026
    Startups

    How to Disable ACR on Your TV (And Stop Data Tracking Forever)

    January 13, 2026
    Startups

    Samsung’s new 6K monitor can project in 3D without the need for glasses – but this model is more shocking

    December 24, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Google tests AI-operated audio overview in search results for some questions

    June 16, 20250 Views

    Yes, this was the original voice of the Garat in the trailer for the thief VR

    June 16, 20250 Views

    Best LC10 loadout in call of duty: Warzone

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.