Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Samsung fans won’t like this: OnePlus beats the S25 Ultra in many ways

    November 16, 2025

    Walmart will sell you this $89 LG UltraGear monitor for a limited time — but it won’t last

    November 16, 2025

    A week with this Ora Ring competitor took the edge off my excitement – ​​here’s how things went

    November 16, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»Self-improving language models are becoming reality with MIT’s updated SEAL technology
    AI/ML

    Self-improving language models are becoming reality with MIT’s updated SEAL technology

    PineapplesUpdateBy PineapplesUpdateOctober 14, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Self-improving language models are becoming reality with MIT’s updated SEAL technology
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Self-improving language models are becoming reality with MIT’s updated SEAL technology

    Researchers at the Massachusetts Institute of Technology (MIT) are drawing renewed attention to the development open sourcing A technique that allows large language models (LLMs) – such as those underpinning ChatGPT and most modern AI chatbots – to improve themselves by generating synthetic data.

    The technology, known as SEAL (Self-Adapting LLM), was first described in a paper published in June and covered by VentureBeat at the time.

    a significantly expanded and An updated version of the paper was released last monthas well as Open source code posted on Github (under the MIT license, allowing commercial and enterprise use), and this week the AI ​​on the social network X is making new waves among power users.

    SEAL allows LLMs to autonomously generate and apply their own fine-tuning strategies. Unlike traditional models, which rely on fixed external data and human-made optimization pipelines, SEAL enables models to evolve by producing their own synthetic training data and associated optimization instructions.

    The development comes from a team affiliated with MIT’s Improbable AI Lab, including Adam Zweiger, Jyotish Pari, Han Guo, Ekin Akyurek, Yoon Kim, and Pulkit Aggarwal. Their research was recently presented at the 39th Conference on Neural Information Processing Systems (NeurIPS 2025).

    Background: From “Beyond Static AI” to Self-Adaptive Systems

    Earlier this year, VentureBeat first reported on SEAL as an early-stage framework that allowed language models to generate and train their own synthetic data – a potential remedy for stagnation of pre-trained models once deployed.

    At that stage, SEAL was designed as a proof of concept that could let enterprise AI agents learn continuously in dynamic environments without manual retraining.

    Since then, research has advanced significantly. The new version expands on the prior framework by demonstrating that SEAL’s self-adaptability scales with model size, more effectively integrates reinforcement learning to reduce catastrophic forgetting, and formalizes SEAL’s dual-loop structure (inner supervised fine-tuning and external reinforcement adaptation) for reproducibility.

    The updated paper also presents a discussion of practical deployment challenges when evaluating different incentive formats, improved stability during learning cycles, and estimation.

    Addressing the limitations of static models

    While LLMs have demonstrated remarkable capabilities in text generation and comprehension, their adaptation to new tasks or knowledge is often manual, brittle, or context dependent.

    SEAL challenges this status quo by equipping models with the ability to generate what they call “self-edits” – natural language output that specifies how the model should update its weights.

    These self-edits may take the form of improved information, logical implications, or device configuration for enrichment and training. Once generated, the model corrects itself based on these edits. The process is guided by reinforcement learning, where the reward signal comes from improved performance on a downstream task.

    The design mimics how human learners might rewrite or reorganize study material to better assimilate the information. This restructuring of knowledge before assimilation serves as a significant advantage over models that passively consume new data “as is”.

    performance in all tasks

    SEAL has been tested in two main domains: knowledge assimilation and few-shot learning.

    In a knowledge incorporation setting, researchers evaluated how well a model could internalize new factual content through a similar route to the SQuAD dataset, a benchmark reading comprehension dataset introduced by Stanford University in 2016, which contains over 100,000 crowd-sourced question-answer pairs based on Wikipedia articles (Rajpurkar et al., 2016).

    Instead of making corrections directly to the paragraph text, The model generated synthetic implications of the passage And then improvements were made on them.

    After two rounds of reinforcement learning, the model increased question-answer accuracy from 33.5% to 47.0% on the no-context version of SQuAD – surpassing results obtained using synthetic data generated by GPT-4.1.

    In a few-shot learning setting, SEAL was evaluated using a subset of the ARC benchmark, where tasks require reasoning from only a few examples. Here, SEAL generated self-edits specifying data augmentation and hyperparameters.

    After reinforcement learning, The success rate at correctly solving stalled tasks increased from 20% to 72.5% using self-edits generated without reinforcement learning. Models that relied only on context-based learning without any adaptation received a score of 0%.

    technical framework

    SEAL operates using a two-loop structure: an inner loop performs supervised fine-tuning based on self-edits, while an outer loop uses reinforcement learning to refine the policy that generated those self-edits.

    The reinforcement learning algorithm used is based on ReSTEM, which combines sampling with filtered behavioral cloning. During training, only self-editing is reinforced, improving performance. This approach effectively teaches the model which types of edits are most beneficial for learning.

    For efficiency, SEAL applies LoRa-based fine-tuning instead of full parameter updates, enabling faster experimentation and lower-cost optimization.

    Strengths and limitations

    The researchers report that SEAL can produce high-utility training data with minimal supervision, outperforming even larger external models such as GPT-4.1 on specific tasks.

    They also demonstrate that SEAL generalizes beyond its original setup: it continues to perform well when scaling from single-pass updates to multi-document continuous pretraining scenarios.

    However, the framework is not without limitations. One issue is catastrophic forgetting, where updates to incorporate new information can impair performance on previously learned tasks.

    In response to this concern, co-author Jyo Pari told VentureBeat via email that reinforcement learning (RL) reduces forgetting more effectively than standard supervised fine-tuning (SFT), citing a recent paper on the topic. He said that combining this insight with SEAL could lead to new versions where SEAL learns not only the training data, but also the reward function.

    Another challenge is the computational overhead: evaluating each self-edit requires fine-tuning and performance testing, which can take 30–45 seconds per edit – significantly more than standard reinforcement learning tasks.

    As Jyo explained, “Training SEAL is non-trivial because it requires 2 loops of optimization, an outer RL one and an inner SFT. At inference time, updating the model weights will also require new system infrastructure.” He stressed the need for future research into deployment systems as a key path to making SEAL practical.

    Additionally, the current design of SEAL assumes the presence of paired tasks and context answers for every context, which limits its direct applicability to unlabeled corpora. However, Geo clarified that as long as there is a downstream task with a computable reward, SEALs can be trained to adapt accordingly – even in security-critical domains. In theory, a SEAL-trained model can learn to avoid training on harmful or malicious inputs when guided by an appropriate reward signal.

    AI community reactions

    The AI ​​research and builder community has reacted to the SEAL paper with a mix of excitement and speculation. On X, formerly Twitter, several prominent AI-focused accounts weighed in on the potential impact.

    user @VraserXA self-described teacher and AI enthusiast, he called SEAL “the birth of continuous self-learning AI” and predicted that models such as OpenAI’s GPT-6 could adopt a similar architecture.

    In his words, SEAL represents “the end of the frozen-wet era”, ushering in systems that evolve with changes in the world around them.

    He highlighted SEAL’s ability to continuously form memories, repair knowledge, and learn from real-time data, comparing it to a fundamental step toward models that not only use information but absorb it.

    During this time, @alex_prompterThe co-founder of the AI-powered marketing venture, designed SEAL as a leap toward models that literally rewrite themselves. “MIT recently created an AI that can rewrite its own code to be smarter,” he wrote. Citing the main results of the paper – 40% increase in factual recall using self-generated data and better performance than GPT-4.1 – He described the findings as confirmation that “LLMs that refer to themselves are no longer sci-fi.”

    The enthusiasm reflects a broader appetite in the AI ​​field for models that can evolve without constant retraining or human oversight – especially in rapidly changing domains or personalized use cases.

    Future directions and open questions

    In response to questions about scaling SEAL to larger models and tasks, Jyo pointed to experiments (Appendix B.7) that showed that as the size of models increases, so does their self-adaptation ability. He compared this to students improving their study techniques over time – larger models are better at generating useful self-edits.

    When asked whether SEAL generalizes to the new motivation styles, he confirmed this, citing Table 10 in the paper. However, he also acknowledged that the team has not yet tested SEAL’s ability to move to completely new domains or model architectures.

    “SEAL is an early work demonstration of the possibilities,” he said. “But this requires a lot more testing.” He said generalization may improve as SEALs are trained on a broader distribution of tasks.

    Interestingly, the team found that just a few reinforcement learning steps already resulted in measurable performance gains. “This is exciting,” Geo said, “because it means that with more computation, we can expect even greater improvements.” They suggested that future experiments could explore more advanced reinforcement learning methods beyond ReSTEM, such as Group Relative Policy Optimization (GRPO).

    Towards a more adaptive and agentic model

    SEAL represents a step toward models that can improve autonomously over time, by integrating new knowledge and reconfiguring how they learn. The authors envision future extensions where SEAL could support self-training, continuous learning, and the development of agentic systems – models that interact with and incrementally adapt to evolving environments.

    In such settings, a model could use SEAL to synthesize weight updates after each interaction, gradually internalizing the behavior or insight. This can reduce the need for repeated supervision and manual intervention, especially in data-constrained or specialized domains.

    As public web text becomes saturated and further scaling of LLMs is hindered by data availability, self-directed approaches such as SEAL may play an important role in pushing the boundaries of what LLMs can achieve.

    You can access the SEAL project, including code and further documentation:

    language MITs Models reality Seal Selfimproving technology Updated
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleFinal Flash Sale: Save up to $624 on Disrupt 2025 Pass
    Next Article How Pinky Cole took control of Slutty Vegan
    PineapplesUpdate
    • Website

    Related Posts

    AI/ML

    Forget fine-tuning: SAP’s RPT-1 brings ready-to-use AI to business tasks

    November 4, 2025
    AI/ML

    ClickUp adds new AI assistant to better compete with Slack and Notion

    November 4, 2025
    AI/ML

    How to Prepare Your Company for a Passwordless Future – in 5 Steps

    November 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    Samsung fans won’t like this: OnePlus beats the S25 Ultra in many ways

    November 16, 2025

    Walmart will sell you this $89 LG UltraGear monitor for a limited time — but it won’t last

    November 16, 2025

    A week with this Ora Ring competitor took the edge off my excitement – ​​here’s how things went

    November 16, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2025 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.