Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried the only agentive browser that runs native AI – and found only one downside

    November 7, 2025

    Get 4 Free iPhone 17 or Galaxy S25 Phones from T-Mobile Right Now – Here’s How

    November 7, 2025

    She has 3 secrets to doubling the revenue of your mom’s business

    November 7, 2025
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»AI/ML»Huawei’s new open source technology shrinks LLM to run on less powerful, less expensive hardware
    AI/ML

    Huawei’s new open source technology shrinks LLM to run on less powerful, less expensive hardware

    PineapplesUpdateBy PineapplesUpdateOctober 8, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Huawei’s new open source technology shrinks LLM to run on less powerful, less expensive hardware
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Huawei’s new open source technology shrinks LLM to run on less powerful, less expensive hardware

    Huawei’s Computing Systems Lab in Zurich has introduced a New open-source quantization method For large language models (LLM) the aim is to reduce memory demands without compromising output quality.

    technique, called SINQ (Sinkhorn-Normalized Quantization)Designed to be fast, calibration-free, and easy to integrate into existing model workflows. The code to execute this has been provided by the Huawei research team Github And hugging face Under a permissive, enterprise-friendly Apache 2.0 license, organizations are allowed to take it and use it, modify it, and deploy it commercially – all for free.

    Across models of different sizes, SINQ cuts memory usage 60-70%Depends on architecture and bit-width.

    This enables models that previously required more than 60 GB of memory to run ~20 GB setup– A key enabler for running large models on a single high-end GPU or even multi-GPU consumer-grade setups.

    This makes it possible to run models that previously required high-end enterprise GPUs – such as NVIDIA’s A100 or H100 – on significantly more affordable hardware, such as a single nvidia geforce rtx 4090 ,around $1600), rather than enterprise hardware such as a100 80gb ,$19,000) or even h100 units which over $30,000,

    For teams using cloud infrastructure, the savings are equally tangible. A100-based instances often cost $3-4.50 per hour, while 24GB GPUs like the RTX 4090 are available for $1-1.50 per hour on many platforms.

    Over time, especially for extended estimation workloads, this difference may increase. Thousands of dollars in cost savingsas well as unlocking LLM deployment on small groups, local workspaces, or consumer-grade setups previously constrained by memory.

    Tackling the memory challenge of LLM

    Running larger models often requires a compromise between performance and size.

    In practice, neural networks are used decimal numbers To represent both weight and activity. A floating-point number can express a wide range of values ​​(very small, very large, with fractional parts).

    This flexibility is helpful because during training and estimation, the weights and activation scales can vary dramatically. Using floating-point allows the model to be precisely adjusted. (For example, a weight could be 0.0023 or 123.45, and floating-point could capture both with good accuracy.)

    Quantization – a method that reduces the accuracy of model weights – offers a practical route to reducing memory usage, but typically comes with a trade-off in model quality, especially at 4-bit precision and below.

    When you convert those floating-point values ​​to lower-precision formats (like 8-bit integers), you’re approximating them.

    This means you store and compute with fewer bits, which is faster and more memory-efficient – but you risk losing fidelity (ie introducing smaller errors).

    The trick is to do the transformation carefully so that the behavior of the model remains approximately the same, even though internally it is working with a rough estimate of those weights and activations.

    SINQ addresses these problems by offering a plug-and-play solution that delivers robust performance even in low-precision settings without the need for calibration data or inter-layer dependencies.

    How does SINQ work?

    The SINQ approach introduces two main innovations:

    1. Dual-axis scaling: Instead of using a single scale factor to quantize the matrix, SINQ uses separate scaling vectors for rows and columns. This helps reduce the impact of outliers and allows the quantization error to be distributed more flexibly across the matrix.

    2. Sinkhorn–Knopf-style generalization: A fast algorithm inspired by sinkhorn iterations is used to normalize the standard deviations of rows and columns in a matrix. This helps reduce what the authors call “matrix imbalance”, a new proxy metric that has been shown to be more effective than alternatives such as kurtosis for improving quantization performance.

    The combination of these two features allows SINQ to outperform other calibration-free techniques such as round-to-nearest (RTN), HQQ, and Hadamard-based quantization in many benchmarks.

    Performance and compatibility

    SINQ has been evaluated across a wide range of architectures and models, including the Qwen3 series, LLaMA, and DeepSeek.

    On benchmarks such as WikiText2 and C4, SINQ consistently reduces error and flip rates compared to baseline methods, often approaching or matching the performance of calibrated solutions.

    It also supports non-uniform quantization schemes such as NF4 and can be combined with calibration methods such as AWQ, yielding the variant A-SINQ. In calibrated settings, A-SINQ further reduces the gap with full-precision models.

    In terms of runtime efficiency, SINQ models scale approximately twice as fast as HQQ and up to 30 times faster than AWQ. This makes it suitable for both research and production environments where quantification time is a practical constraint.

    Open source and easy to use

    Huawei has released SINQ as an open-source project under a permissive, enterprise-friendly Apache 2.0 license, with implementation instructions and reproduction tools available on GitHub:

    The repository includes support for quantizing the hugging face model with just a few lines of code, as well as tools for saving and reloading quantified weights. The default settings provide a balance between memory savings and accuracy, and users can customize parameters such as bit-width, tiling strategy, and group size based on their needs.

    The authors also provide assessment integration through lm-eval There are plans to release pre-quantized models on the library and Hugging Face Hub in the near future.

    looking ahead

    With the increasing demand to run large models on consumer-grade hardware, quantization is becoming an essential tool. SINQ aims to lower the barrier to entry for LLM deployment, enabling developers and researchers to efficiently miniaturize models without major changes in quality or compatibility.

    Further updates – including integration with Hugging Face Transformers and pre-quantized model releases – are planned, making this a project worth watching in the quantization space.

    expensive hardware Huaweis LLM open powerful run shrinks source technology
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleGoogle won’t fix new ASCII smuggling attack in Gemini
    Next Article This solid midrange HP laptop is now at about 50% discount on Walmart
    PineapplesUpdate
    • Website

    Related Posts

    AI/ML

    Forget fine-tuning: SAP’s RPT-1 brings ready-to-use AI to business tasks

    November 4, 2025
    AI/ML

    ClickUp adds new AI assistant to better compete with Slack and Notion

    November 4, 2025
    AI/ML

    How to Prepare Your Company for a Passwordless Future – in 5 Steps

    November 4, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    I tried the only agentive browser that runs native AI – and found only one downside

    November 7, 2025

    Get 4 Free iPhone 17 or Galaxy S25 Phones from T-Mobile Right Now – Here’s How

    November 7, 2025

    She has 3 secrets to doubling the revenue of your mom’s business

    November 7, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2025 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.