Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»Openai’s Red Team Plan: Make AI Fort to Chatgate Agent
    AI/ML

    Openai’s Red Team Plan: Make AI Fort to Chatgate Agent

    PineapplesUpdateBy PineapplesUpdateJuly 19, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Openai’s Red Team Plan: Make AI Fort to Chatgate Agent
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now


    If you remember it, Openai started a powerful new feature for the chat tomorrow and, with it, hosted new security risks and impacts.

    Called “CHATGPT Agent”, this new feature is an alternative mode that the chatgpt paying subscribers can engage by clicking “Tools” in the Prompt Entry Box and selecting “Agent Mode”, at what point, at what point, they can say to log in to your email and other web accounts; Write email and answer; Download, modify, and make files; And host other tasks on their behalf, autonomally, like a real person, who are using computers with their login credentials.

    Obviously, this requires the user to rely on the Chatgpt agent that it does not cause anything problematic or nefarious, or leaked their data and sensitive information. It also holds a higher risk for the user and their employer than regular chatgpt, which cannot log in to web accounts or directly modify files.

    Karen Gu, a member of the Security Research Team in Openi, commented on X that “We have activated our strongest security measures for the Chatgpt agent. This is the first model that we have classified as a high capacity in biology and chemistry under our preparation structure. Why it means here and what we are doing to keep it safe.”


    AI Impact series returns to San Francisco – 5 August

    The next phase of AI is here – are you ready? Leaders of Block, GSK and SAP include how autonomous agents are re-shaping the enterprise workflows-from the decision making of time-to-end and automation.

    Now secure your location – space is limited:


    Openai’s Red Team Plan: Make AI Fort to Chatgate Agent

    So how did Openai handle all these security issues?

    Red team mission

    Looking at Openai’s chatgpt agent System cardThe “Reed Team” by the company faced a challenging mission employed to test the feature: in particular, 16 PhD security researchers were given 40 hours to test it.

    Through systematic testing, the Red team discovered seven universal adventures that can compromise the system, in which AI agents reveal important weaknesses in handling the real -world interaction.

    Whatever was further broad safety testing, most of its part was based on red teaming. Red teaming network presented 110 attacks, from early injections to biological information extraction efforts. More than sixteen internal risk thresholds. Each discovery gave Openai engineers to the insight that they needed to write and deploy before the launch.

    Results speak for yourself Results published in system cardThe Chatgpt agent emerged with significant safety reforms, including visual browser irrelevant instruction attacks and 95% performance against strong biological and chemical safety measures.

    Red teams highlighted seven universal feats

    Openai’s red teaming networks consisted of 16 researchers, including biosafy-packed PhDs, who presented the 110 attack attempts during the test period by Topgather. While crossing the more than sixteen internal risk thresholds, AI agents reveal the original weaknesses in how the real -world interaction. But the real success came from the unprecedented access to UK AISI, reaching the internal logic chain and policy text of the chatup agent. Obviously, intelligence will never have regular attackers.

    In the four trial rounds, the UK AISI forced Openai to execute seven universal feats, which had the ability to compromise on any interaction:

    The attack vector forced Openi’s hand

    Invasion typesuccess rateTargetEffect
    Visual browser hidden instructions33%Web pageActive data exfILTION
    Google drive connector exploitationnot disclosedCloud documentForced document leaked
    Multi-phase chain attacksVariableCross-site activitiesComplete session agreement
    Biological information extraction16 submissions exceeded thresholdHazardous knowledgePotential weapons

    The assessment of Fer.ai was openly important to Openai’s approach. Despite the 40-hour testing that revealed only three partial weaknesses, he identified that the current security mechanisms were much more dependent on monitoring during logic and equipment-use processes, which researchers considered a potentially single point of failure when compromised.

    How Red Teaming helped change the weaknesses of chat in a fort

    Openai’s response to Red Team’s results redefined the entire segments of the Chatgpt agent architecture. One of several initiatives, including the construction of a dual-layer inspection architecture that monitors 100% of production in real time, receives these average reforms:

    Security improvement after raid team discoveries

    Defense metricPrevious modelChopping agentImprovement
    Irrelevant instructions (visual browser)82%95%+13%
    IN-contemporary data exfoliation75%78%+3%
    Active data exfILTION58%67%+9%
    System reliabilitySample-based100% coverageComplete monitoring

    Architecture works in this way:

    • first tier: Remember a fast classifier with 96% that flags suspicious materials
    • second tier: An argument model with 84% recall analyzes flagged interactions for real threats

    But technical rescue explains only part of the story. Openai made a difficult safety option that accepts AI operations require significant restrictions for safe autonomous execution.

    Based on the searched weaknesses, Openai implemented the following counters in its model:

    1. Watch mode activation: When the Chatgpt agent reaches sensitive references such as banking or email accounts, the system freeze all activities if users have navigated. This is in direct response to the data exflaction efforts discovered during the test.
    2. Memory features disabled: Despite having a main functionality, the memory has become completely disabled to prevent aggravated data leak attacks.
    3. Terminal restriction: Network Access Limited only to receive requests, the researchers of the command execution weaknesses were exploited.
    4. Rapid Remediation Protocol: A new system that patchs weaknesses within a few hours of search – after the red teamrs develop after developing how quickly exploitation can spread.

    During the pre-launch test alone, the system identified and solved 16 important weaknesses, which were discovered by Red teamers.

    A biological risk wake-up call

    Red teamers revealed the ability that the CHATGPT could be dedicated to the agent and can lead to more biological risks. Sixteen experienced participants of the Red Teaming Network, with each biosafy-packed PhD, attempted to extract dangerous biological information. Their submissions showed that the model can synthesize the literature published on modifying and creating biological threats.

    In response to the findings of Red Teamers, OpenIAI classified the Chatgpt agent as a “high capacity” for biological and chemical risks, not because they found certain evidence of weapons capacity, but as a precautionary measure based on the conclusions of the red team. This trigger:

    • 100% scanning of always-on safety classifier traffic
    • A topical classifier for biology related materials
    • Monitor a logic with 84% recollection for armed materials
    • A bio -bug bounty program for the ongoing vulnerability search

    AI What Red Teams taught Openai about AI security

    The 110 attack submission revealed the pattern, which forced the fundamental change in the safety philosophy of openiI. They include the following:

    Firmness on power: Attackers do not require sophisticated exploits, they all need more time. Red teamers showed how patients, incremental attacks can eventually compromise the system.

    Trust limits are imagination: When your AI agent can reach Google Drive, the web can browse, and the code can execute, the traditional security perimeter can dissolve. Red teamers took advantage of the gaps between these capabilities.

    Monitoring is not optional: The discovery of sample-based monitoring recalled important attacks, requiring 100% coverage.

    Speed Matters: Traditional patch cycles measured in weeks are useless against quick injection attacks that can spread immediately. The Rapid Remediation protocol patchs weaknesses within hours.

    Openai Enterprise is helping to create a new security base for AI

    For CISO evaluating AI Personogen, the discovery of the Red Team establishes clear requirements:

    1. Quantitative protection: Set 95% of the defense rate industry benchmark against the documented attack vector of the chatter agent. The nuances of several tests and results defined in the system card explained the reference to how they have completed it and should a reading for anyone associated with model safety.
    2. Full visibility: 100% traffic monitoring is no longer aspiring. Openai’s experiences suggest that it is mandatory how easily red teams can hide the attacks anywhere.
    3. Instant reaction: Hour, not weeks, discovered weaknesses for patch.
    4. Applied limits: Some operations (during sensitive works like memory access) should be disabled until it is safe.

    The test of UK AISI proved to be especially instructive. All the seven universal attacks he identified was patches before the launch, but their privileged access to internal systems revealed the weaknesses that would eventually be discovered by the opponent.

    “This is an important moment for our preparation work,” Gu has written on X. “Before we reach high capacity, were about analyzing preparation capabilities and planning safety measures. Now, for agents and more competent models for the future, preparation safety measures have become an operational requirement.”

    Red teams are the main to make safe, more secure AI models

    Seven universal feats discovered by researchers and 110 attacks of Openi’s Red Team Network became crucible, which were fake chatgate agents.

    By explaining how AI agents can be made weapons, Red teams forced the first to build the AI system where security is not just a feature. This is the foundation.

    The results of the Chatgpt agent prove the effectiveness of red teaming: blocking 95% visual browser attacks, 78% data exfIs efforts, monitoring each interaction.

    In the AIS Arms Race, which companies survive and thrive, they see their red teams as the core architects of the stage that push it to the extent of security and security.

    Daily insights on business use cases with VB daily

    If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

    Read our privacy policy

    Thanks for membership. See more VB newsletters here.

    There was an error.

    Agent Chatgate fort Openais plan Red team
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGalaxy Z Fold 7, Panasonic S1 II, Samsung QS700F and more
    Next Article NASA Grounds Boeing Starlineer after testing flight failures by 2026
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    T-Mobile slashes over $1,000 off AT&T and Verizon with a new family plan — here’s what’s what

    January 18, 2026
    Startups

    Apple conducts rare round of layoffs affecting one team

    November 25, 2025
    Startups

    How Microsoft’s new security agent helps businesses stay one step ahead of AI-enabled hackers

    November 21, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.