Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»Every AI model is a fluttering medicine – and LMARENA proposes a fix
    AI/ML

    Every AI model is a fluttering medicine – and LMARENA proposes a fix

    PineapplesUpdateBy PineapplesUpdateAugust 19, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Every AI model is a fluttering medicine – and LMARENA proposes a fix
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Every AI model is a fluttering medicine – and LMARENA proposes a fix

    Johan63/ISTock/Getty Image Plus Getty Image

    Key takeaways of zdnet

    • AI Frontier models fail to provide safe and accurate output on medical subjects.
    • The goal of LMARENA and Datatecnica is ‘strictly’ testing LLMS ‘of medical knowledge’.
    • It is not clear how agents and drug-specific LLMs will be measured.

    Get ZDNET Tech coverage more deeply: Add us as a favorite google source Chrome and chromium on the browser.


    Despite many AI progresses in therapy cited in scholars’ literature, all generative AI fails to produce the program, which are both safe and accurate when working with medical subjects. A new report By benchmark firm LMARENA.

    This discovery is especially related to the fact that people are going into bots like chat for medical answers, and research suggests that people rely on AI’s medical advice on the advice of doctors, even when it is wrong.

    Too: Patients are confident of AI’s medical advice on doctors – even when it is wrong, it is still found.

    The new study comparing Openai’s GPT-5 with many models of Google, Anthropic and Meta, finds that “performs from enough performance in real-world biomedical research.”

    (Disclosure: ZDNET’s original company Ziff Davis filed a case of April 2025 against Openai, alleging that it violates Ziff Davis copyright training and operating its AI system.)

    A knowledge interval in therapy

    According to the LMARENA team, “No current model firmly fulfills the demands of biomedical scientists’ arguments and domain-specific knowledge.”

    The report concludes that the current models are simply very loose and very fuzzy to meet the standards of the drug:

    “This fundamental difference highlights the growing mismatch between general AI capabilities and needs of special scientific communities. Biomedical researchers work at the intersection of complex, developed knowledge and real -world influence. They do not need models that require ‘right’; they require devices that help reduce insight, reduce errors, and raise the speed of search.”

    LMARENA-2025-Graph-of-LLMS-Biomedical-accuracy-and-protection.

    Lmarena + datatecnica

    The study resonates conclusions from other benchmark tests related to the drug. For example, in May, Openai unveiled the healthbench, a suit of the lesson related to the medical conditions and conditions, which can be presented as the chatbot properly by the person seeking medical advice. The study found that O3 large language model of Openai, the best accuracy score by 0.598, left enough room to improve the benchmark.

    Too: Openai’s healthbench shows that AI’s medical advice is improving – but who will listen?

    Expansion of benchmark

    To address the gap between AI model and therapy, LMARENA has worked closely with startups DattekanikaWhich had unveiled a benchmark suit of trials for a question-answer-level benchmark, cardbomedbench for evaluation of LLM in biomedical research earlier this year.

    Together, lmarena and datatecnica plan to expand what is said BiomederenaA leaderboard that shoulders to people shoulder and votes with the best performance compared to the AI model.

    Too: Meta’s Lama 4 ‘Flock’ controversy and AI contamination, explained

    Biomedarena means specific for therapy ResearchUnlike general-purpose leaderboard, instead of very common questions.

    Bioomederena is already done by scientists in the Interamural Research Program of the US National Institute of Health, he notes, “where scientists pursue high-risk, high-inam projects that are often beyond traditional educational research due to their scale, complexity, or resource demands.”

    According to the LMARENA team, biomedarena work, “will focus on the work and evaluation strategies based in day-to-day realities of biomedical discovery-from interpretation of using data and literature to the hypothesis generation and to assist in clinical translation.”

    Too: You can track the top AI image generator through this new leaderboard – and vote for your favorite also

    As the web right of ZDNET reported in June, Lmarena.ai The AI model ranks. The website was originally established as a research initiative through UC Berkeley under the name chatboot Arena and has since become a full platform, with financial assistance from UC Berkeley, A16Z, Sequia Capital and others.

    Where can they be wrong?

    There are two big questions for this new benchmark effort.

    First, the study with doctors has shown that General AI’s utility dramatically expands when the AI models are bent for the “Gold Standard” database of medical information, capable of performing better by tapping the top frontier model with dedicated large language models (LLM) only by tapping in information.

    Too: Hooking a liberal AI for medical data for doctors improves utility

    From today’s announcement, it is not clear how LMARENA and Datatecnica plan to address that aspect of the AI model, which is actually a type of agent capacity – the ability to tap in resources. Without measuring how AI models use external resources, benchmarks may have limited utility.

    Second, several medical-specific LLMs are being developed at all times, including the “Medapalam” program of Google developed two years ago. It is not clear that biomedarena work will take into account these dedicated drug LLM. The work so far has tested only the general frontier model.

    Too: Google’s Medpal Medical AI emphasizes human physicians

    This is a completely valid option from LMARENA and Datatecnica, but it leaves a completely important effort.

    fix fluttering LMARENA medicine model proposes
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSamsung will give you a free 65 -inch TV right now – how to get one here
    Next Article Helium saw a possible passage for toothemix deflation
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    A new earbud security flaw could leave you a victim of remote spying – here’s how to fix it

    January 18, 2026
    Startups

    Samsung’s new 6K monitor can project in 3D without the need for glasses – but this model is more shocking

    December 24, 2025
    Startups

    OpenAI is secretly fast-tracking ‘garlic’ to fix ChatGPT’s biggest flaws: what we know

    December 3, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Google tests AI-operated audio overview in search results for some questions

    June 16, 20250 Views

    Yes, this was the original voice of the Garat in the trailer for the thief VR

    June 16, 20250 Views

    Best LC10 loadout in call of duty: Warzone

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.