
Small model for enterprises comes from the latest joint for wave AI21 LabsWhich is betting that bringing the model to the equipment will make traffic free in data centers.
AI21’s Jamba Reasoning 3B, a “tiny” open-source model that can respond on the basis of extended logic, code generation and ground truth. Jamba argument 3B handles over 250,000 tokens and can guess edge equipment.
The company said that Jamba has worked on equipment like laptop and mobile phone to 3B.
AI21 co-CEO Eori Goshen told Venturebeat that the company looks at cases of more venture use for small models, mainly because the most estimates for devices free data centers.
Goshen said, “Everything we are seeing in the industry right now is an economics issue, where very expensive data centers are build-outs, and data centers vs. The revenue generated at the depreciation rate of all their chips does not add mathematics,” Goshen said.
He said that in the future “by the industry and large industries will be hybrid in the sense that some calculations will be on the local level and other estimates will go to the GPU.”
Tested on a macbook
Jamba Reasoning 3B Mamba connects architecture and transformer to allow it to run 250K token window on equipment. AI21 said that this 2-4X can move rapidly rapidly. Goshen said that Mamba architecture contributed significantly to the model speed.
The hybrid architecture of Jamba Reasoning 3B also allows it to reduce memory requirements, which can reduce its computing requirements.
AI21 tested the model on a standard MacBook Pro and found that it could process 35 tokens per second.
Goshen said that the model functions the best for functions related to function calling, policy-founded generation and tool routing. He said that simple requests, such as asking for information about the upcoming meeting and asking the model to create an agenda for it, can be done on equipment. More complex arguments for GPU groups can be saved.
Small models in enterprise
Enterprises are interested in using a mixture of small models, some of which are specifically designed for their industry and some are condensed versions of LLM.
on September, Meta Issued Mobilelm-R1, a family of Reasoning Model From 140 m to 950 meters parameters. These models are designed for mathematics, coding and scientific logic rather than chat applications. Mobilelm-R1 can run on calculations.
Google‘S Letter bud One of the first small models in the market was designed to run on portable devices such as laptops and mobile phones. Gemma has since Extended,
Like companies Fico Has also started building its own model. Fico launched Its FICO-centric language and FICO focused the sequence small models that would only answer finance-specific questions.
Goshen said that the big difference between offering his model is that it is even smaller than most models and yet it can run logic functions without renouncing speed.
Benchmark test
In benchmark testing, Jamba Reasoning 3B performed stronger performance compared to other small models, including Cowen 4B, MetaLama from 3.2B-3B, and Phi-4-Mini Microsoft,
This improved all models on the IFBENCH test and the final exam of humanity, although it came second in Qwen 4 on MMlu-Pro.
Goshen said that another advantage of small models like Jamba Reasoning 3B is that they are highly stable and provide better privacy options to enterprises because the conclusion is not sent to the server anywhere else.
“I believe that there is a world where you can adapt to the needs and experience of the customer, and the model placed on the devices is a large part of it,” he said.

