Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation
    AI/ML

    From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

    PineapplesUpdateBy PineapplesUpdateOctober 30, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation
    Share
    Facebook Twitter LinkedIn Pinterest Email


    From static classifiers to reasoning engines: OpenAI’s new model rethinks content moderation

    Enterprises, are keen to ensure that any AI models they use Follow safety and safe use Policies, improve LLMs so that they do not answer unwanted questions.

    However, most security and red teaming occurs before deployment, “baking in” policies before users have fully tested the model’s capabilities in production. OpenAI believes this can offer more flexible options for enterprises and encourage more companies to introduce security policies.

    The company has released two open-ended models under research preview that it believes will make enterprises and models more flexible in terms of security measures. gpt-oss-safeguard-120b and gpt-oss-safeguard-20b will be available on permissive Apache 2.0 licenses. The models are refined versions of OpenAI’s open-source GPT-OSS, released in AugustThe first release in the Oss family since Summer.

    one in blog postOpenAI said oss-safeguard “uses logic to directly interpret developer-provider policy at inference time – classifying user messages, completions, and completed chats according to the developer’s requirements.”

    The company explained that, since the model uses chain-of-thought (COT), developers can receive explanations of the model’s decisions for review.

    “Additionally, the policy is provided during inference rather than after the model is trained, so it is easier for developers to iteratively modify policies to increase performance," OpenAI said in its post. "This approach, which we initially developed for internal use, is significantly more flexible than the traditional method of training a classifier to indirectly estimate the decision boundary from a large number of labeled examples."

    Developers can download both models from here hugging face,

    Flexibility vs. Baking In

    Initially, the AI ​​model will not know the company’s preferred security triggers. While model providers red-team Models and PlatformsThese security measures are for widespread use. companies like Microsoft And Amazon Web Services even provide platform to bring Guardrails for AI applications And agent.

    Enterprises use security classifiers to help a model recognize patterns of good or bad input. This helps models learn which questions they should not answer. This also helps ensure that the models do not stray and provide accurate answers.

    “Traditional classifiers can have high performance with low latency and operating costs," OpenAI said. "But collecting a sufficient amount of training examples can be time-consuming and expensive, and updating or changing the policy requires re-training the classifier."

    The model takes two inputs simultaneously before drawing conclusions on where the material fails. A policy and content are required to be classified under its guidelines. OpenAI said the models work best in situations where:

    • Potential harms are emerging or evolving, and policies need to adapt quickly.

    • The domain is extremely granular and difficult for small classifiers to handle.

    • Developers do not have enough samples to train high-quality classifiers for each risk on their platform.

    • Latency is less important than creating high-quality, explainable labels.

    The company said that GPT-OSS-Safeguard “is different because its logic capabilities allow developers to enforce any policy,” even those they wrote during estimation.

    The models are based on OpenAI’s internal tool, Safety Reasoner, which enables its teams to be more iterative in installing guardrails. They often start out with very strict security policies, “and use relatively large amounts of computation where necessary,” then adjust the policies as they move the model through production and risk assessment changes.

    carry out security work

    OpenAI said that the GPT-OSS-Safeguard model outperformed its GPT-5-think and the original GPT-OSS model on multipolicy accuracy based on benchmark testing. It also ran the models on the ToxicChat public benchmark, where they performed well, although GPT-5-Thinking and Safety Reasoner slightly outperformed them.

    But there are concerns that this approach could lead to centralization of security standards.

    John Theakston, assistant professor of computer science at Cornell University, said, “Security is not a well-defined concept. Any implementation of security standards will reflect the values ​​and priorities of the organization that creates it, as well as the limitations and shortcomings of its model.” “If the industry as a whole adopts the standards developed by OpenAI, we risk institutionalizing an exclusive perspective on security and short-circuiting broader scrutiny of the security requirements for AI deployment across many sectors of society.”

    It should also be noted that OpenAI has not released base models for the oss family of models, so developers cannot completely iterate on them.

    However, OpenAI is confident that the developer community can help refine gpt-oss-safeguard. It will host a hackathon in San Francisco on December 8.

    classifiers content engines model moderation Openais Reasoning rethinks static
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleA new free Google Labs experiment creates AI-generated ads for your small business
    Next Article I wore 5 different headphones on 8 flights — here’s how each pair won (and lost)
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    Samsung’s new 6K monitor can project in 3D without the need for glasses – but this model is more shocking

    December 24, 2025
    Startups

    Google’s search chief rejects this strategy of licensing news content amid AI controversy

    December 15, 2025
    Startups

    Is DeepSeek’s new model the latest setback for proprietary AI?

    December 3, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.