Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»New ‘Markovian Thinking’ technique opens the way to million-token AI logic
    AI/ML

    New ‘Markovian Thinking’ technique opens the way to million-token AI logic

    PineapplesUpdateBy PineapplesUpdateOctober 22, 2025No Comments6 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    New ‘Markovian Thinking’ technique opens the way to million-token AI logic
    Share
    Facebook Twitter LinkedIn Pinterest Email


    New ‘Markovian Thinking’ technique opens the way to million-token AI logic

    Researchers at MILA have proposed a new technique that makes large language models (LLMs) significantly more efficient when performing complex reasoning. called markovian thinkingThe approach allows LLMs to engage in longer reasoning without incurring the prohibitive computational costs that currently limit such tasks.

    The team’s implementation, an environment called Delethink, structures the logic chain into fixed-size chunks, breaking the scaling problem that plagues very long LLM responses. Initial estimates suggest that for 1.5B parameter models, this method can cut training costs by more than two-thirds compared to the standard approach.

    Quadratic curse of long-chain logic

    For an LLM to solve a complex problem, it often needs to generate a long chain of intermediate “thinking” tokens, often referred to as a chain-of-thought (COT). In recent years, researchers have found using reinforcement learning (RL) has significantly improved the reasoning capabilities of models by training them to produce longer COT (sometimes referred to as LongCOT).

    However, the standard method for this has a serious flaw: AI "State" (the prompt plus all the argument tokens generated so far in its processing) increases with each new argument token. for modern Transformer-Based ModelThis means that as the argument chain becomes longer, the computational cost increases quadratically, making it extremely expensive to train models for very complex tasks.

    Most current efforts to manage this cost focus on limiting how much the model thinks, indirectly preferring smaller solutions or terminating the process early. Although these methods provide some relief, MILA researchers still work within the LongSOT framework and are thus fundamentally bound by its quadratic nature.

    Instead of trying to control computational growth, Mila created an RL environment that avoids the quadratic problem altogether. As co-author Amirhossein Kazmanejad explained, the goal is to enable capabilities like multi-week reasoning and scientific discovery. "That arrangement (and the RL required to enable such capabilities) is not supported by the current LongCOT paradigm due to the quadratic computation cost," He said.

    Thinking in chunks with Dailythink

    The researchers’ solution is a paradigm they call "markovian thinker," Where the model performs logic while keeping the size of its logic context window constant. The main idea is to change the RL setup to different "how long does the model think" From "How much context it has to process." If done correctly, a Markovian thinker transforms the quadratic growth problem into linear computation and fixed memory requirements for LLM logic.

    The researchers put this paradigm into practice through Delethink, which forces models to reason in sequences of fixed-sized pieces, such as 8,000 tokens at a time. Within each fragment, the model reasons normally using the classic attention mechanism. But when it reaches the segment limit, the environment resets the context, creating a new prompt that contains the original query and a brief information. "to push forward" From the previous part. For example, carryover could be the last few tokens of the previous part of the COT or a summary of the most important results.

    This reordering of the problem forces the model to learn how to embed a summary of its progress, or "textual markovian state," In this carryover to continue our argument in the next section. This addresses the common concern whether the model can remember important details from previous steps.

    According to Kazmanejad, the model learns what to remember. "With training… the model is forced to learn to advance the task-critical state," he explained. They added an important clarification for practical use: The original input prompt has not been modified, including the documents or relevant data added to it. “Our approach is targeted at the logic stage and does not modify the signal," He said.

    delay action

    To test their approach, the researchers trained R1-Distill-1.5b with Delthink on a dataset of competition-level math problems, then evaluated it against several benchmarks. The model was trained to reason on up to 24,000 tokens, but with fixed 8,000-token segments.

    researcher This was compared with models trained with the standard LongCoT-RL method. Their findings indicate that the model trained with DailyThink can scale up to 24,000 tokens, and matches or surpasses the LongCoT model trained with a similar 24,000-token budget on math benchmarks. On other tasks such as coding and PhD-level questions, DailyThink also matches or slightly surpasses its LongCoin counterpart. “Overall, these results indicate that DailyThink uses its Thinking Token as effectively as LongCOT-RL with less computation,” the researchers wrote.

    The benefits become even more apparent when moving beyond the training budget. While models trained with LongCoT quickly stabilized at their training limits, Delethink-trained models continued to improve their performance. For example, some math problems were solved by the model only after reasoning with up to 140,000 tokens, far exceeding its 24,000-token training budget. This linear calculation gain is sufficient for enterprise applications. The researchers estimate that training a model with an average thought length of 96,000 tokens would require 27 H100-GPU-months with LongCOT, while only 7 would be required with DailyThink.

    This efficiency extends directly to the estimation of primary operating costs for most enterprises. "Models trained in Markovian Thinking use the same inference style (delaythink-tracing) during testing time, which provides the same benefits as linear computation and persistent memory after training," Kazmanejad said. He offered a practical example: an AI agent could do "Debug a larger codebase and think long term… which certainly reduces costs significantly compared to the traditional LongCOT approach."

    Interestingly, the researchers found that humans, even without any specific training, already display some ability to think in a Markovian way. This discovery has immediate practical implications for developers. "In practice, this means that—without DELETHINK-RL—these models can already run a DELETHINK-tracing wrapper and have competitive performance with LongCOT on our benchmark tasks." Kazmanejad said.

    such as his experiments with larger models GPT-OSS 120B Showed strong performance with Delthink across many complex tasks. This latent capacity provides a strong starting point for RL training, helping to explain why the method is so effective. “Together, these results show that DailyThink is consistent with and scales well with state-of-the-art models,” the researchers concluded.

    The success of Markovian thinking shows that it may be possible "Next generation logic models for thinking for millions of tokens," The researchers noted. This opens the door to fundamentally new AI capabilities that transcend existing constraints.

    "Markovian thinking…opens the way for models that can ‘think’ over much longer horizons, which we see as a necessary step toward ultimate scientific discovery," Kazmanejad said. "Our approach overcomes a major hurdle and can allow training for tasks with much longer horizons, enabling the next generation of capabilities."

    logic Markovian milliontoken opens technique Thinking
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleFeeling burnt out? Now your Aura Ring can tell you more about it – here’s how
    Next Article David Sachs’s Craft leads $42M Series A in government startup Starbridge
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    My 5 Favorite Distros of Linux in the Past – and Why I’m Still Thinking About Them

    December 24, 2025
    AI/ML

    Forget fine-tuning: SAP’s RPT-1 brings ready-to-use AI to business tasks

    November 4, 2025
    AI/ML

    ClickUp adds new AI assistant to better compete with Slack and Notion

    November 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.