Close Menu
Pineapples Update –Pineapples Update –

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Pinterest Vimeo
    Pineapples Update –Pineapples Update –
    • Home
    • Gaming
    • Gadgets
    • Startups
    • Security
    • How-To
    • AI/ML
    • Apps
    • Web3
    Pineapples Update –Pineapples Update –
    Home»Web3»The success of Deepsek shows why inspiration is for AI innovation
    Web3

    The success of Deepsek shows why inspiration is for AI innovation

    PineapplesUpdateBy PineapplesUpdateApril 26, 2025No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    The success of Deepsek shows why inspiration is for AI innovation
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Join our daily and weekly newspapers for exclusive content on the latest updates and industry-composure AI coverage. learn more


    January 2025 shook the AI ​​landscape. It seems that invincible Openai and powerful American technical giants were shocked, which we can definitely call a Dalit in the field of big language model (LLM). Deepsek, a Chinese firm not on anyone’s radar, suddenly challenged Openai. It is not that the Deepsek-R1 was better than the top model of American veterans; It was slightly behind in terms of benchmarks, but it suddenly inspired everyone to think about efficiency in terms of hardware and energy use.

    Given the unavailability of the best high-end hardware, it seems that Deepsek was motivated to innovate in the field of efficiency, which was a less concern for big players. Openai has claimed that they have evidence that Deepsek may have used its model for training, but we have no concrete evidence to support it. Therefore, is it true or is it open, just trying to please its investors, the subject of debate. However, Deepsek has published its work, and people have verified that the results are at least a very small -scale fertility.

    But how could Deepsek achieve such costs while American companies could not? The brief answer is simple: they had more inspiration. The long answer requires a slightly more technical explanation.

    Deepsek used KV-Cash optimization

    An important cost for GPU memory was the optimization of key-value cache used in every meditation layer in an LLM.

    LLMs are made of transformer blocks, each of which contains a attention layer after a regular vanilla feed-forward network. The feed-forward network models the relationship ideologically, but in practice, it is always difficult to determine the pattern in the data. Meditation layer solves this problem for language modeling.

    The model processes the texts using tokens, but for simplicity, we will refer to them as words. In an LLM, each word is assigned a vector in a high dimension (says, one thousand dimensions). Ideally, each dimension represents a concept, such as warm or cold, green, soft, being a noun. The vector representation of a word has its meaning and value according to each dimension.

    However, our language allows other words to modify the meaning of each word. For example, an apple has a meaning. But we can have a green apple as a modified version. A more extreme example of amendment will be that an apple in an iPhone context differs from Apple in a Meadow context. How do we modify the vector meaning of a word based on another word to our system? This is the place where meditation comes.

    The attention model provides each word two other vectors: a key and a querry. Querry represents the qualities of the meaning of a word that can be modified, and the key represents modifications that can provide other words. For example, the word ‘Green’ can provide information about color and green-numb. So, the key to the word ‘Green’ will be a high value on ‘Green-Non’ dimension. On the other hand, the word ‘apple’ may be green or not, so the query vector of ‘apple’ will also have a higher value for the green-moan dimension. If we take the dot product of ‘Green’ key with the query of ‘Apple’, the product should be relatively larger than the key of ‘table’ and the query of ‘Apple’. The meditation layer then adds a small fraction of the value of the word ‘green’ to the value of the word ‘Apple’. In this way, the value of the word ‘Apple’ is modified as slightly greenery.

    When LLM generates text, it performs one word after the other. When it produces a word, all the words already generated become part of its context. However, the key and values ​​of those words have already been calculated. When another word is added to the context, its value is required to update its querry and all previous words based on the key and values. Therefore all those values ​​are stored in GPU memory. This is KV cash.

    Deepsek determined that the key and value of a word belongs to. So, the meaning of green word and its ability to influence greenness is clearly very closely related. So, it is possible to compress both single (and perhaps small) vector and decompress when processing very easily. Deepsek has found that it affects their performance on the benchmark, but it saves a lot of GPU memory.

    Deepsek implemented Mo

    The nature of a nervous network is that the entire network needs to be evaluated (or calculated) for each querry. However, all this is not a useful calculation. The knowledge of the world sits in the weight or parameters of a network. Knowledge about Eiffel Tower is not used to answer questions about the history of South American tribes. Knowing that an apple is a fruit that is not useful when answering questions about the general principle of relativity. However, when the network is calculated, all parts of the network are processed regardless. This lesson increases heavy calculation costs during production that must be avoided ideally. This is the place where the idea of ​​mixing-experts (MOE) comes.

    In an MOE model, the nerve network is divided into several small networks called experts. Note that the ‘specialist’ in the subject is not clearly defined; Network does it out during training. However, the networks provide some relevance scores for each querry and only activate parts with high matching scores. This provides huge cost savings in calculation. Note that some questions require specialization in many areas to answer properly, and the performance of such questions will be degraded. However, because the areas have shown data, the number of such questions is minimal.

    Importance of learning reinforcement

    An LLM is taught to think through a chain-off-the-through model, in which the model is properly-tuned, which is to copy thinking before answering. The model is asked to add its idea orally (generate ideas before generating the answer). The model is then evaluated on both thoughts and answers, and is trained with reinforcement learning (rewarded for a correct match and punished for a wrong match with training data).

    This requires expensive training data with ideas tokens. Deepsek only asked the system to generate ideas between the tag And And to generate answers between tags And The model is rewarded or punished which is purely based on the form (use of tags) and a match of answers. This required very little expensive training data. During the early phase of RL, the model tried to generate very little ideas, resulting in incorrect answers. Eventually, the model learned to generate both long and consistent ideas, which calls the Deepsek ‘A-Hah’ moment. After this point, the quality of the answers was greatly improved.

    Deepsek appoints several additional adaptation tricks. However, they are highly technical, so I will not engage them here.

    Final view about Deepsek and big market

    In any technology research, we need to first see what is possible before improving efficiency. This is a natural progress. Deepsek’s contribution to the LLM landscape is unprecedented. Educational contribution cannot be ignored, whether they are trained using Openai output. It can also change the way the operation of startups operate. But there is no reason for disappointment to openi or other American giants. This is how research works – a group benefits from research of other groups. The earlier research conducted by Google, Openai and many other researchers definitely benefited the lamp.

    However, the idea that Openai LLM will dominate the world indefinitely, now is not very likely. No amount of regulatory lobbying or finger-pointing will preserve their monopoly. The technology is already in the open, which has made its progress invincible. Although this may be a little headache for openiAI investors, it is eventually a win for the rest of us. While the future is of many people, we will always be grateful to early contributors like Google and Openai.

    Debashish Ray Chavadhuri is a senior principal engineer Talentica software,

    Daily insights on business use cases with VB daily

    If you want to impress your boss, VB daily has covered you. We give you the scoop inside what companies are doing with generative AI, from regulatory changes to practical deployment, so you can share insight for maximum ROI.

    Read our privacy policy

    Thanks for membership. See more VB newsletters here.

    There was an error.

    The success of Deepsek shows why inspiration is for AI innovation

    Deepsek innovation inspiration shows success
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleAll small (but very impressive) is changing the quality of life in Diablo 4 next week with patches 2.2.0 and season 8
    Next Article Founder, apply for Startup Batalfield 200
    PineapplesUpdate
    • Website

    Related Posts

    Startups

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026
    Startups

    Steve Jobs and the seven rules of success

    December 1, 2025
    Startups

    Fictly’s new app brings people together through their favorite books and TV shows

    November 21, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Microsoft’s new text editor is a VIM and Nano option

    May 19, 2025797 Views

    The best luxury car for buyers for the first time in 2025

    May 19, 2025724 Views

    Massives Datenleck in Cloud-Spichenn | CSO online

    May 19, 2025650 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    10,000 steps or Japanese walk? We ask experts if you should walk ahead or fast

    June 16, 20250 Views

    FIFA Club World Cup Soccer: Stream Palmirus vs. Porto lives from anywhere

    June 16, 20250 Views

    What do chatbott is careful about punctuation? I tested it with chat, Gemini and Cloud

    June 16, 20250 Views
    Our Picks

    I tried 0patch as a last resort for my Windows 10 PC – here’s how it compares to its promises

    January 20, 2026

    A PC Expert Explains Why Don’t Use Your Router’s USB Port When These Options Are Present

    January 20, 2026

    New ‘Remote Labor Index’ shows AI fails 97% of the time in freelancer tasks

    January 19, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms And Conditions
    • Disclaimer
    © 2026 PineapplesUpdate. Designed by Pro.

    Type above and press Enter to search. Press Esc to cancel.