Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Samsung showed me its secret HDR10+ Advanced TV samples – and I’m almost sold

    November 8, 2025

    Starbucks barista’s side hustle brings in $1 million a month

    November 8, 2025

    A new Chinese AI model claims to outperform GPT-5 and Sonnet 4.5 – and it’s free

    November 8, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»Will updating your AI agents help or hinder their performance? Raindrop’s new tool experiments tell you
    AI/ML

    Will updating your AI agents help or hinder their performance? Raindrop’s new tool experiments tell you

    PineapplesUpdateBy PineapplesUpdateOctober 10, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Will updating your AI agents help or hinder their performance? Raindrop’s new tool experiments tell you
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Will updating your AI agents help or hinder their performance? Raindrop’s new tool experiments tell you

    It seems like almost every week for the last two years since ChatGPT launched, new large language models (LLMs) have been released, either from rival labs or from OpenAI. Enterprises are hard-pressed to keep up with the sheer pace of change, let alone understand how to adapt to it – which of these new models, if any, should they adopt to power their workflows and the custom AI agents being built to accomplish them?

    Help has arrived: AI Application Observability Startup drop started the experimentA new analytics feature that the company describes as the first A/B testing suite designed specifically for enterprise AI agents – allows companies to see and compare how updating agents to new underlying models, or changing their instructions and tool access, will impact their performance with real end users.

    This release expands Raindrop’s existing observation tools, giving developers and teams a way to see how their agents behave and evolve in real-world situations.

    With experiments, teams can track how changes – such as a new tool, prompt, model update, or complete pipeline refactor – impact AI performance across millions of user interactions. The new feature is now available to users on Raindrop’s Pro subscription plan ($350 monthly) raindrop.ai,

    A data-driven lens on agent development

    Co-founder and Chief Technology Officer, Raindrop ben hilak A product announcement video (above) mentions that the experiment helps teams see how “virtually anything changed”, including tool usage, user intent and release rates, and detecting differences based on demographic factors like language. The goal is to make model iteration more transparent and measurable.

    The experiment interface presents results visually, indicating whether an experiment performs better or worse than its baseline. An increase in negative signals may indicate higher task failure or partial code output, while an improvement in positive signals may reflect more complete responses or better user experiences.

    By making this data easy to interpret, Raindrop encourages AI teams to approach agent iteration with the same rigor as modern software deployment – ​​tracking results, sharing insights, and addressing regressions before combining them.

    Background: AI from observation to experiment

    The launch of Experiments by Raindrop is one of the first upon the company’s foundation AI-Native Observability PlatformDesigned to help enterprises monitor and understand how their generator AI systems behave in production.

    As VentureBeat reported earlier this year, the company — originally known as Dawn AI — emerged to address Hillac., A former Apple human interface designer tackles what he called the “black box problem” of AI performance, helping teams catch failures “as they happen and explain to enterprises what went wrong and why.”"

    At the time, Hillack described how “AI products fail constantly – in ways both hilarious and terrifying,” noting that unlike traditional software, which makes explicit exceptions, “AI products fail silently.” Raindrop’s original platform focused on detecting those silent failures by analyzing signals such as user feedback, task failures, denials, and other conversational anomalies in millions of daily incidents.

    Co-founder of the company-Hilac, alexis gaubaAnd Zubin Singh Kotich – Created Raindrop after experiencing first-hand the difficulty of debugging AI systems in production.

    “We started not with infrastructure, but by building AI products,” Haylak explained. venturebeat“But very quickly, we saw that to develop anything serious, we needed tooling to understand AI behavior – and that tooling didn’t exist.”

    With experiments, Raindrop pursues the same mission detection of failures To measuring improvementThe new tool turns observational data into actionable comparisons, helping enterprises test whether changes to their models, signals or pipelines actually make their AI agents better – or different altogether.

    Fix for “Avals Pass, Agents Fail” problem

    Traditional evaluation frameworks, while useful for benchmarking, rarely capture the unpredictable behavior of AI agents operating in dynamic environments.

    As co-founder of Raindrop alexis gauba explained in that LinkedIn announcement“Traditional evaluations don’t really answer this question. They’re great unit tests, but you can’t predict your user’s actions and your agent is running for hours, calling hundreds of tools.”

    Gauba said the company has consistently heard a common frustration from teams: “Evals pass, agents fail.”

    The purpose of experiments is to show that difference what actually changes When developers send updates to their systems.

    This tool enables side-by-side comparison of models, tools, intentions or properties, revealing measurable differences in behavior and performance.

    Designed for real-world AI behavior

    In the announcement video, Raindrop described the experiments as “a way to compare anything and measure how your agent’s behavior actually changed in production over the course of millions of real interactions.”

    The platform helps users identify issues like task failure spikes, forgetting or new tools triggering unexpected errors.

    It can also be used in the opposite direction – starting with a known problem, such as “agent stuck in loop”, and figuring out what model, device, or flag is driving it.

    From there, developers can dive into detailed traces to find the root cause and quickly ship a solution.

    Each experiment provides a visual analysis of metrics such as tool usage frequency, error rate, conversation duration, and response length.

    Users can click on any comparison to access the underlying event data, giving them a clear view of how the agent’s behavior changed over time. Shared links make it easy to collaborate with teammates or report findings.

    Integration, Scalability and Accuracy

    According to Hillac, Experian integrates directly with “the feature flag platforms companies know and love (like Statsig!)” and is designed to work seamlessly with existing telemetry and analytics pipelines.

    For companies without those integrations, it can still compare performance over time — like yesterday vs. today — without additional setup.

    Hilack said teams typically need about 2,000 users per day to produce statistically meaningful results.

    To ensure the accuracy of comparisons, Experiment monitors the adequacy of sample size and alerts users if a test lacks enough data to draw valid conclusions.

    “We focus on making sure that metrics like task failure and user frustration are metrics you would wake up an on-call engineer for,” Hillack explained. He said teams can delve deeper into the specific conversations or events that drive those metrics, ensuring transparency behind each aggregate number.

    Security and data protection

    Raindrop operates as a cloud-hosted platform, but also offers on-premises personally identifiable information (PII) redaction for enterprises that need additional controls.

    Hilac said the company is SOC 2 compliant and has launched a PII Guard Feature that uses AI to automatically remove sensitive information from stored data. “We take the security of customer data very seriously,” he stressed.

    Pricing and Plans

    The experiment is part of a raindrop pro planWhich costs $350 per month or $0.0007 per interaction. The Pro tier also includes in-depth research tools, topic clustering, custom issue tracking, and semantic search capabilities.

    of raindrops starter plan – $65 per month or $0.001 per interaction – Provides core analytics including issue detection, user feedback signals, Slack alerts, and user tracking. Both plans come with a 14-day free trial.

    Large organizations can opt for this enterprise plan With custom pricing and advanced features like SSO login, custom alerts, integrations, edge-PII reduction, and priority support.

    Continuous improvements to AI systems

    With experiments, Raindrop positions itself at the intersection of AI analytics and software observability. Its focus on “measuring the truth,” as stated in the product video, reflects a broader push within the industry toward accountability and transparency in AI operations.

    Instead of relying solely on offline benchmarks, Raindrop’s approach emphasizes real user data and contextual understanding. The company hopes this will allow AI developers to move faster, identify root causes sooner, and produce better-performing models with confidence.

    agents experiments hinder performance raindrops tool updating
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGet a phone line with unlimited 5G from Metro by T-Mobile for $25/month – here’s how
    Next Article Yes, your iPhone can track everywhere you go — here’s how to turn it off
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    This Windows PC can easily replace my Mac Mini when it comes to local AI performance

    November 6, 2025
    Startups

    The best AI agents are terrible freelancers – for now

    November 5, 2025
    AI/ML

    Forget fine-tuning: SAP’s RPT-1 brings ready-to-use AI to business tasks

    November 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    Samsung showed me its secret HDR10+ Advanced TV samples – and I’m almost sold

    November 8, 2025

    Starbucks barista’s side hustle brings in $1 million a month

    November 8, 2025

    A new Chinese AI model claims to outperform GPT-5 and Sonnet 4.5 – and it’s free

    November 8, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2025 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.