Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now
If you remember it, Openai started a powerful new feature for the chat tomorrow and, with it, hosted new security risks and impacts.
Called “CHATGPT Agent”, this new feature is an alternative mode that the chatgpt paying subscribers can engage by clicking “Tools” in the Prompt Entry Box and selecting “Agent Mode”, at what point, at what point, they can say to log in to your email and other web accounts; Write email and answer; Download, modify, and make files; And host other tasks on their behalf, autonomally, like a real person, who are using computers with their login credentials.
Obviously, this requires the user to rely on the Chatgpt agent that it does not cause anything problematic or nefarious, or leaked their data and sensitive information. It also holds a higher risk for the user and their employer than regular chatgpt, which cannot log in to web accounts or directly modify files.
Karen Gu, a member of the Security Research Team in Openi, commented on X that “We have activated our strongest security measures for the Chatgpt agent. This is the first model that we have classified as a high capacity in biology and chemistry under our preparation structure. Why it means here and what we are doing to keep it safe.”
AI Impact series returns to San Francisco – 5 August
The next phase of AI is here – are you ready? Leaders of Block, GSK and SAP include how autonomous agents are re-shaping the enterprise workflows-from the decision making of time-to-end and automation.
Now secure your location – space is limited:

So how did Openai handle all these security issues?
Red team mission
Looking at Openai’s chatgpt agent System cardThe “Reed Team” by the company faced a challenging mission employed to test the feature: in particular, 16 PhD security researchers were given 40 hours to test it.
Through systematic testing, the Red team discovered seven universal adventures that can compromise the system, in which AI agents reveal important weaknesses in handling the real -world interaction.
Whatever was further broad safety testing, most of its part was based on red teaming. Red teaming network presented 110 attacks, from early injections to biological information extraction efforts. More than sixteen internal risk thresholds. Each discovery gave Openai engineers to the insight that they needed to write and deploy before the launch.
Results speak for yourself Results published in system cardThe Chatgpt agent emerged with significant safety reforms, including visual browser irrelevant instruction attacks and 95% performance against strong biological and chemical safety measures.
Red teams highlighted seven universal feats
Openai’s red teaming networks consisted of 16 researchers, including biosafy-packed PhDs, who presented the 110 attack attempts during the test period by Topgather. While crossing the more than sixteen internal risk thresholds, AI agents reveal the original weaknesses in how the real -world interaction. But the real success came from the unprecedented access to UK AISI, reaching the internal logic chain and policy text of the chatup agent. Obviously, intelligence will never have regular attackers.
In the four trial rounds, the UK AISI forced Openai to execute seven universal feats, which had the ability to compromise on any interaction:
The attack vector forced Openi’s hand
| Invasion type | success rate | Target | Effect |
| Visual browser hidden instructions | 33% | Web page | Active data exfILTION |
| Google drive connector exploitation | not disclosed | Cloud document | Forced document leaked |
| Multi-phase chain attacks | Variable | Cross-site activities | Complete session agreement |
| Biological information extraction | 16 submissions exceeded threshold | Hazardous knowledge | Potential weapons |
The assessment of Fer.ai was openly important to Openai’s approach. Despite the 40-hour testing that revealed only three partial weaknesses, he identified that the current security mechanisms were much more dependent on monitoring during logic and equipment-use processes, which researchers considered a potentially single point of failure when compromised.
How Red Teaming helped change the weaknesses of chat in a fort
Openai’s response to Red Team’s results redefined the entire segments of the Chatgpt agent architecture. One of several initiatives, including the construction of a dual-layer inspection architecture that monitors 100% of production in real time, receives these average reforms:
Security improvement after raid team discoveries
| Defense metric | Previous model | Chopping agent | Improvement |
| Irrelevant instructions (visual browser) | 82% | 95% | +13% |
| IN-contemporary data exfoliation | 75% | 78% | +3% |
| Active data exfILTION | 58% | 67% | +9% |
| System reliability | Sample-based | 100% coverage | Complete monitoring |
Architecture works in this way:
- first tier: Remember a fast classifier with 96% that flags suspicious materials
- second tier: An argument model with 84% recall analyzes flagged interactions for real threats
But technical rescue explains only part of the story. Openai made a difficult safety option that accepts AI operations require significant restrictions for safe autonomous execution.
Based on the searched weaknesses, Openai implemented the following counters in its model:
- Watch mode activation: When the Chatgpt agent reaches sensitive references such as banking or email accounts, the system freeze all activities if users have navigated. This is in direct response to the data exflaction efforts discovered during the test.
- Memory features disabled: Despite having a main functionality, the memory has become completely disabled to prevent aggravated data leak attacks.
- Terminal restriction: Network Access Limited only to receive requests, the researchers of the command execution weaknesses were exploited.
- Rapid Remediation Protocol: A new system that patchs weaknesses within a few hours of search – after the red teamrs develop after developing how quickly exploitation can spread.
During the pre-launch test alone, the system identified and solved 16 important weaknesses, which were discovered by Red teamers.
A biological risk wake-up call
Red teamers revealed the ability that the CHATGPT could be dedicated to the agent and can lead to more biological risks. Sixteen experienced participants of the Red Teaming Network, with each biosafy-packed PhD, attempted to extract dangerous biological information. Their submissions showed that the model can synthesize the literature published on modifying and creating biological threats.
In response to the findings of Red Teamers, OpenIAI classified the Chatgpt agent as a “high capacity” for biological and chemical risks, not because they found certain evidence of weapons capacity, but as a precautionary measure based on the conclusions of the red team. This trigger:
- 100% scanning of always-on safety classifier traffic
- A topical classifier for biology related materials
- Monitor a logic with 84% recollection for armed materials
- A bio -bug bounty program for the ongoing vulnerability search
AI What Red Teams taught Openai about AI security
The 110 attack submission revealed the pattern, which forced the fundamental change in the safety philosophy of openiI. They include the following:
Firmness on power: Attackers do not require sophisticated exploits, they all need more time. Red teamers showed how patients, incremental attacks can eventually compromise the system.
Trust limits are imagination: When your AI agent can reach Google Drive, the web can browse, and the code can execute, the traditional security perimeter can dissolve. Red teamers took advantage of the gaps between these capabilities.
Monitoring is not optional: The discovery of sample-based monitoring recalled important attacks, requiring 100% coverage.
Speed Matters: Traditional patch cycles measured in weeks are useless against quick injection attacks that can spread immediately. The Rapid Remediation protocol patchs weaknesses within hours.
Openai Enterprise is helping to create a new security base for AI
For CISO evaluating AI Personogen, the discovery of the Red Team establishes clear requirements:
- Quantitative protection: Set 95% of the defense rate industry benchmark against the documented attack vector of the chatter agent. The nuances of several tests and results defined in the system card explained the reference to how they have completed it and should a reading for anyone associated with model safety.
- Full visibility: 100% traffic monitoring is no longer aspiring. Openai’s experiences suggest that it is mandatory how easily red teams can hide the attacks anywhere.
- Instant reaction: Hour, not weeks, discovered weaknesses for patch.
- Applied limits: Some operations (during sensitive works like memory access) should be disabled until it is safe.
The test of UK AISI proved to be especially instructive. All the seven universal attacks he identified was patches before the launch, but their privileged access to internal systems revealed the weaknesses that would eventually be discovered by the opponent.
“This is an important moment for our preparation work,” Gu has written on X. “Before we reach high capacity, were about analyzing preparation capabilities and planning safety measures. Now, for agents and more competent models for the future, preparation safety measures have become an operational requirement.”

Red teams are the main to make safe, more secure AI models
Seven universal feats discovered by researchers and 110 attacks of Openi’s Red Team Network became crucible, which were fake chatgate agents.
By explaining how AI agents can be made weapons, Red teams forced the first to build the AI system where security is not just a feature. This is the foundation.
The results of the Chatgpt agent prove the effectiveness of red teaming: blocking 95% visual browser attacks, 78% data exfIs efforts, monitoring each interaction.
In the AIS Arms Race, which companies survive and thrive, they see their red teams as the core architects of the stage that push it to the extent of security and security.

