Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»LLMS generates ‘fluent nonsense’ while arguing outside its training field
    AI/ML

    LLMS generates ‘fluent nonsense’ while arguing outside its training field

    PineapplesUpdateBy PineapplesUpdateAugust 20, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    LLMS generates ‘fluent nonsense’ while arguing outside its training field
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now


    A New study From Aerizona state university Researchers suggest that the “chain-off-three” (COT) arguments observed in large language models (LLMS) may be more than “brittle mirage” than real intelligence. The research makes LLM on the growing body of work, questioning the depth of logic, but it takes a unique “data distribution” lens, where and why the COT is systematically broken.

    Importantly for app for builders, paper goes beyond the critics that develop LLM-operated applications when developing apparent, practical guidance to these limitations, from test strategies to the role of fine-tuning.

    Promise and problem of chain-off-three

    Cot prompting, which asks an LLM for “step by step”, has shown impressive results on complex tasks, leading to the perception that models are engaged in inferior processes such as humans. However, a close inspection often reveals logical discrepancies that challenge this approach.

    Various studies suggest that LLMs often depend on surface-level semantics and clues rather than logical processes. Models generate admirable-sounding logic by repeating the token patterns seen during training. Nevertheless, this approach often fails on tasks that distract from familiar templates or when irrelevant information is offered.


    AI scaling hits its boundaries

    Power caps, rising token costs, and entrance delays are re -shaping Enterprise AI. Join our exclusive salons to learn about top teams:

    • Transform energy into a strategic profit
    • Architecting efficient estimates for real thrruput benefits
    • Unlocking competitive ROI with sustainable AI system

    Secure your location to stay ahead,


    Despite these comments, the researchers of the new study argue that “why and when COT Reasoning fail and when and how long is still a mystery,” which is to address the purpose of their study. Previous work has already shown that LLMS struggle to normalize their arguments. As the paper notes, “theoretical and empirical evidence suggests that the cot normalizes when the tests share the latent structures with the input training data; otherwise, the performance decreases rapidly.”

    A new lens on LLM logic

    ASU researchers propose a new lens to see this problem: COT is not a function of logic, but is a sophisticated form of patterns, which is fundamentally bound by statistical pattern in his training data. They believe that “the success of the cott is not a model’s underlying logic ability, but a conditionally out-off-dystribution (OOD) testing cases stems from the ability to normally normally normalize which are structurally similar to the in-decimals examples.” In other words, an LLM is good in applying the old pattern to new data that looks similar, but not really in solving novel problems.

    LLMS generates ‘fluent nonsense’ while arguing outside its training field
    Data distribution lens source: github

    To test this hypothesis, he dissected COT capabilities in three dimensions of “distribution shift” (changes between training data and test data). First, he tested the “task generalization” to see if a model could apply a learned logic process for a new type of work. Second, he examined the “generalization of length” to determine whether it could handle the argument chains that are much longer or shorter than those on which it was trained. Finally, he assessed the “format normalization” for measuring that the model is sensitive to a minor change in the word or structure of the prompt.

    For his analysis, he developed an outline Datalchami To train smaller LLM from scratches in a controlled environment, allowing them to be measured properly how the performance falls when the training falls beyond the data.

    “Data distribution lens and controlled environment are central that we were trying to express,” Chengshui Jhao, a doctoral student at ASU and co-writer of paper told venturebeat. “We hope to create a place where the public, researchers and developers can independently detect the nature of LLM and check and pursue the limits of human knowledge.”

    Miraj confirmed

    Based on their findings, researchers concluded that cot logic is “a sophisticated form of structured pattern matching, which is fundamentally bound by data distribution viewed during training.” When slightly tested outside this distribution, the performance falls. Looks like a structured argument, a mrust is more of a mristress, “instead of logical conclusions, remembering in training data or emerging from launched patterns.”

    Breakdown was in line with all three dimensions. On new tasks, the models failed to normalize and repeated the nearest pattern seen by them during training instead. When different -length chains are argued, they struggled, often tried to artificially add or remove to match the length of their training examples. Finally, their performance proved to be highly sensitive to superficial changes in the signal, especially the variation in core elements and instructions.

    Interestingly, researchers found that these failures could be fixed quickly. Performing the model on a very small sample of new, unseen data through a fine-tuning (SFT), by tuned, performing on that specific type of problem increased rapidly. However, this quick fix further supports the pattern-mill theory, suggesting that the model is not learning more abstract arguments, but instead remembering a new pattern to overcome a specific weakness.

    Takeaways for enterprise

    Researchers give a direct warning to physicians, “the risk of relying on COT as a plug-and-play solution for logic and with human thinking is offered to take precautions against the equality of cot-style production.” They provide three major pieces of advice for developers for the construction of applications with LLM.

    1)Guard against extreme-dependent and false confidence. COTs should not be considered as a reliable module for logic in high-day areas such as finance or legal analysis. LLMS can produce “fluent nonsense” (laudable but logically flawed argument) that is more misleading than the lump sum incorrect answer. The author emphasizes that “adequate auditing from domain experts is unavoidable”.

    Jhao said, “The advance of science should be human centered-the masons can help, but Discovery still thrives on humanity and curiosity.”

    2) PRioritize out-of-dystribution (OOD) test. Standard verification, where test data mirror training data is not enough to measure true strength. Developers must apply rigorous testing that systematically examines for failures in function, length and format variations.

    3)Identify the fine-tuning as a patch, not the panacea. While supervised fine-tuning (SFT) can quickly “patch” the performance of a model on a specific new data distribution, it does not create the correct generalization. This only expands the “in-dystribution bubble” of the model slightly. Relying on the SFT to fix each Ood failure is an unstable strategy that fails to address the main drawback of the model of abstract logic.

    While cot is not a form of human cognition, this limit can be managed. Most enterprise applications include a relatively narrow and approximate set of functions. The findings of paper provide a blueprint to ensure reliability within these domains. Developers can build a rigid assessment suit that can be systematically tested model performance against specific tasks, lengths and formats, their application will have to face. This allows them to map and identify the boundaries of the comfort zone of a model of a model where it aligns with their specific requirements.

    This targeted test transforms fine-tuning into an active strategy for alignment of fine-tuning from a reactive “patch”. When the assessment reveals a specific weakness, developers can make small, targeted SFT datasets to address it. Instead of trying to achieve a broad, general argument, this approach uses SFT with surgery to ensure that the model’s pattern-mill capabilities are accurately aligned with the figure of a specific venture function. Finally, the study provides a practical lens beyond the forecast and engineering LLM applications to achieve predicted success.

    Daily insights on business use cases with VB daily

    If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

    Read our privacy policy

    Thanks for membership. See more VB newsletters here.

    There was an error.

    arguing field fluent generates LLMS nonsense training
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleCan portable wind generators change solar energy? My advice after testing one at home
    Next Article Ein vierthetel der Cisos wird nach ransomware -angriff entlassen
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    OpenAI is training models to ‘confess’ to lies – what this means for the future of AI

    December 5, 2025
    Startups

    Google refused to analyze your emails for AI training – here’s what happened

    November 24, 2025
    AI/ML

    Forget fine-tuning: SAP’s RPT-1 brings ready-to-use AI to business tasks

    November 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.