Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Behind the scenes of drone food delivery in Finland

    November 30, 2025

    The most durable USB-C cable I’ve tested so far is only $11 this weekend (and I’ll be buying several)

    November 30, 2025

    Finally, an Android tablet that I wouldn’t mind keeping my iPad Pro for (especially at this price)

    November 30, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»AI’s Atlas Adaptive Bookmaker delivers 400% prediction speedup by learning from workload in real-time
    AI/ML

    AI’s Atlas Adaptive Bookmaker delivers 400% prediction speedup by learning from workload in real-time

    PineapplesUpdateBy PineapplesUpdateOctober 10, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    AI’s Atlas Adaptive Bookmaker delivers 400% prediction speedup by learning from workload in real-time
    Share
    Facebook Twitter LinkedIn Pinterest Email


    AI’s Atlas Adaptive Bookmaker delivers 400% prediction speedup by learning from workload in real-time

    Enterprises expanding AI deployment are hitting an invisible performance wall. Criminal? Fixed speculators who cannot keep pace with changing workloads.

    Speculators are small AI models that work with larger language models during predictions. They further draft multiple tokens, which the main model validates in parallel. This technique (called speculative decoding) has become essential for enterprises trying to reduce inference costs and latency. Instead of generating tokens one at a time, the system can accept multiple tokens at once, dramatically improving throughput.

    Together A.I. today announced research and a new system called ATLAS (Adaptive-Learning Speculator System) that aims to help enterprises overcome the challenge of static speculators. The technology provides a self-learning inference optimization capability that can help deliver up to 400% faster inference performance compared to the baseline level of performance available in existing inference technologies such as VLLM. The system solves an important problem: As AI workloads grow, the speed of predictions slows down, even with specialized speculators.

    which company it has started Focus is on 2023 optimization estimation On your enterprise AI platform. The company earlier this year raised $305 million As customer acceptance and demand has increased.

    "The companies we work with typically see workloads change as they grow, and then they don’t see the same agility in speculative execution as before," Tri Dao, chief scientist at Together AI, told VentureBeat in an exclusive interview. "These punters generally do not do well when their workload starts changing."

    No one talks about the problem of workload flow

    Most speculators in production today "steady" Model. They are trained once on a fixed dataset representing the expected workload, then deployed without any ability for customization. Companies like Meta and Mistral send speculators pre-trained with their main models. Inference platforms like VLLM use these stable speculators to boost throughput without changing output quality.

    but there is a problem. The accuracy of the static bookmaker decreases as an enterprise’s AI use evolves.

    "If you are a company making coding agents, and most of your developers are writing in Python, then suddenly some of them start writing Rust or C, then you see that the speed starts decreasing," Dao explained. "There is a mismatch between what the bookmaker was trained on versus what the actual workload is."

    This workload drift represents a hidden tax on AI scaling. Enterprises either accept poor performance or invest in retraining custom speculators. That process takes only a snapshot in time and quickly becomes outdated.

    How Adaptive Bookmakers Work: A Dual-Model Approach

    Atlas uses a dual speculative architecture that combines stability with optimization:

    static speculator – A heavyweight model trained on extensive data delivers consistent baseline performance. It acts as a "speed floor."

    adaptive bookmaker – A lightweight model continuously learns from live traffic. It specializes in providing immediate attention to emerging domains and usage patterns.

    confidence-aware controller – An orchestration layer dynamically chooses which bookmaker to use. It accommodates speculation "look ahead" Based on confidence score.

    "We still have the static bookmaker to help get you up to speed in the beginning before the adaptive bookmaker learns anything," Ben Athivaratkun, staff AI scientist at Together AI, explained to VentureBeat. "Once the adaptive speculator becomes more confident, the speed increases over time."

    The technical innovation lies in balancing the acceptance rate (how often the target model agrees with the drafted token) and draft latency. As the adaptive model learns from traffic patterns, the controller trusts the lighter speculator more and looks further ahead. This increases performance.

    Users do not need to tune any parameters. "On the user side, users don’t need to turn any knobs," Dao said. "For our part, we’ve tweaked these knobs for users to adjust to a configuration that results in a good speedup."

    Performance that rivals custom silicon

    AI testing shows that Atlas reaches 500 tokens per second on DeepSeek-v3.1 when fully optimized. More impressively, those numbers on the Nvidia B200 GPU match or exceed typical spec chips. Grok’s Custom Hardware.

    "Software and algorithm improvements are able to really bridge the gap with specific hardware," Dao said. "We were looking at 500 tokens per second on these huge models which is even faster than some optimized chips."

    The 400% speedup the company claims for estimation represents the cumulative effect of Together’s turbo optimization suite. FP4 quantization provides 80% speedup over the FP8 baseline. Stable turbo bookmaker adds another 80-100% profit. The adaptive system layers are on top. Each adaptation adds to the benefits of the others.

    compared to standard inference engines such as VLLM or Nvidia’s TensorRT-LLM, the improvement is substantial. The AI ​​benchmarks the two simultaneously against the stronger baseline for each workload before applying speculative optimizations.

    Memory-Compute Tradeoff Explained

    The performance gains arise from exploiting a fundamental inefficiency in modern inference: wasted computation capacity.

    Dao pointed out that typically during inference, most of the computation power is not fully utilized.

    "During inference, which is really the major workload these days, you’re mostly using the memory subsystem," He said.

    Speculative decoding trades idle computation for less memory access. When a model generates one token at a time, it is memory-bound. The GPU remains idle while waiting for memory. But when the speculator proposes five tokens and the target model validates them simultaneously, the usage count increases while the memory accesses remain almost constant.

    "The total amount of computation to generate five tokens is the same, but you only have to access the memory once instead of five times," Dao said.

    Think of it as intelligent caching for AI

    For infrastructure teams familiar with traditional database optimization, adaptive specifiers act like an intelligent caching layer, but with one important difference.

    Traditional caching systems like Redis or memcached require exact matching. You store the exact same query result and retrieve it when that specific query is run again. Adaptive bookmakers work differently.

    "You can look at it as an intelligent way of caching, not storing exact, but detecting some patterns that you see," Dao explained. "Broadly speaking, we’re looking at whether you’re working with similar code, or working with similar, you know, controlling computation in similar ways. Then we can predict what the bigger model is going to say. We become better at predicting it."

    Instead of storing exact responses, the system learns patterns from how the model generates tokens. This recognizes that if you are editing Python files in a specific codebase, certain token sequences become more probable. The bookmaker adapts to those patterns, improving its predictions over time without requiring the same inputs.

    Use cases: RL training and evolving workloads

    Two enterprise scenarios particularly benefit from adaptive speculators:

    reinforcement learning training: As the policy evolves during training, stable speculators quickly fall out of alignment. Atlas continuously adapts to changing policy delivery.

    workload evolution: As enterprises discover new AI use cases, the workload structure changes. "Maybe they started out using AI for chatbots, but then they realized, hey, it can write code, so they started shifting to code," Dao said. "Or they realize that these AIs can actually call up tools and control computers and do accounting and things like that."

    In a vibe-coding session, the adaptive system can specialize for the specific codebase being edited. These are files that were not seen during training. This further increases the acceptance rate and decoding speed.

    What this means for enterprises and the estimation ecosystem

    Atlas is now available on Together AI’s dedicated endpoints as part of the platform at no additional cost. The company’s more than 800,000 developers (up from 450,000 in February) have access to the customization.

    But the broader implications extend beyond one vendor’s product. The shift from static to adaptive optimization represents a fundamental rethinking of how estimation platforms should work. As enterprises deploy AI across multiple domains, the industry will need to move beyond models trained once toward systems that continuously learn and improve.

    AI has historically released some of its research techniques as open source and collaborated with projects such as VLLM. While the fully integrated Atlas system is proprietary, some of the underlying technologies may ultimately impact the broader estimation ecosystem.

    For enterprises looking to lead in AI, the message is clear: Adaptive algorithms on commodity hardware can match custom silicon at a fraction of the cost. As this approach matures across the industry, software optimization is increasingly becoming dominant over specific hardware.

    Adaptive AIs Atlas Bookmaker delivers Learning Prediction Realtime speedup Workload
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGoogle’s virtual try-on shopping tool expands to more countries, now lets you try on shoes
    Next Article Intelligence Meets Energy: ADIPEC 2025 and the AI ​​revolution in the energy sector
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    How Microsoft Finally Delivers on Its Syncable Passkey Promise – and What’s Next

    November 11, 2025
    AI/ML

    Forget fine-tuning: SAP’s RPT-1 brings ready-to-use AI to business tasks

    November 4, 2025
    AI/ML

    ClickUp adds new AI assistant to better compete with Slack and Notion

    November 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    Behind the scenes of drone food delivery in Finland

    November 30, 2025

    The most durable USB-C cable I’ve tested so far is only $11 this weekend (and I’ll be buying several)

    November 30, 2025

    Finally, an Android tablet that I wouldn’t mind keeping my iPad Pro for (especially at this price)

    November 30, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2025 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.