Want smart insight into your inbox? Enterprise AI, only what matters to data and security leaders, sign up for our weekly newspapers. Subscribe now
Small models are in a moment. On heels of the release of a new AI vision model Quite small to fit on smartwatch MIT spinoff liquid AI, and quite small models to walk on smartphones from Google, Nvidia is joining the party today with A new little language model (SLM) Your own, Nemotron-Nano-9 B-V2Which comes with the highest performance in their classmary on the selected benchmark and comes with users the ability to tograph on “argument”, that is, to do self-stomach before outputing an answer.
While 9 billion parameters are larger than some multimilian parameters, small model venture has recently coveredNvidia Note This is a meaningful drawback from its original size of 12 billion parameters And is designed to fit on one Single Nvidia A10 GPU,
In the form of Olexi Kuchiav, AI Model Post-Training Director, Said on x In response to a question I presented to him: “12B was printed to fit up to 9B to fit the A10 which is a popular GPU option for deployment. It is also a hybrid model that allows it to be processed up to 6X compared to a transformer model of the same size. ,
For reference, many major LLMs are in the 70+ billion parameters range (recalling the parameters refer to the internal settings controlling the behavior of the model, more typically a large and more capable, yet the more intensive models calculate).
AI scaling hits its boundaries
Power caps, rising token costs, and entrance delays are re -shaping Enterprise AI. Join our exclusive salons to learn about top teams:
- Transform energy into a strategic profit
- Architecting efficient estimates for real thrruput benefits
- Unlocking competitive ROI with sustainable AI system
Secure your location to stay ahead,
The model handles many languages including English, German, Spanish, French, Italian, Japanese, and extended details, Korean, Portuguese, Russian and Chinese. It is suitable for both The following and code generation.
Nemotron-Nano-9 B-V2 And its Pre-training dataset Right now the hug is available on the face and through the company’s model catalog.
A fusion of transformer and mamba architecture
It is based on Nemotron-HA set of hybrid mamba-transformer models that form the foundation for the latest offerings of the company.
While most popular LLMs are pure “transformer” models, which perfectly depend on the layers, they can be expensive in memory and calculate as the sequence length increases.
Instead, are using Nemotron-H models and others Mamba architecture developed by researchers Also at Carnegie Melan University and Princeton, Weaving in selective state space models (or SSMs), which can handle very long sequences of information by maintaining and outside the state.
These layers are linearly on the scale along the length of the sequence and can process references longer than standard self-coordination without calculating the same memory and overhead.
One hYbrid mamba-transformer reduces costs that obtain up to 2-3 × high throopoot on long references, by replying most attention with linear-time state space layers, With comparable accuracy.
Other AI labs such as AI2 beyond NVIDIA have also released models based on Mamba architecture.
Using language/togle on logic
Nemotron-Nano-9B-V2 has been deployed as an integrated, text-keval chat and scratch trained model.
Before providing a final answer, a system defaults to generate an argument trace, although users can togle this behavior Simple control tokens like /thinking or /no_think.
Model I alsontroduces runtime “thinking budget” managementWho Allows developers to cap number the number of tokens The model fulfills a reaction dedicated to internal arguments before the model.
The purpose of this mechanism is to balance accuracy with delay, Especially in applications such as customer aid or autonomous agents.
Benchmark tells a promising story
The evaluation results highlight competitive accuracy against other open small scale models. Tested in “Reasoning on” mode using Nemo-Skills suit, Nemotron-Nano-9 B-V2 reaches 72.1 percent on AIME25, 97.8 percent on Math500, 64.0 percent on GPQAAnd 71.1 percent on Livecodebench,
The score on the instruction has also been explained by the following and long reference benchmarks: 90.3 percent at Ifval, 78.9 percent on ruler 128k testAnd small but average profit on BFCL V3 and Hle benchmark.

Across the board, Nano-9 B-V2 shows high accuracy than Qwen3-8B, A common point of comparison.

Nvidia shows these results with accuracy-bamn-ruckus decrease that shows how to perform the scales of performance as token allowance for logic. The company suggests that carefully budget control can help developers to adapt to both quality and delay in production use cases.
Trained on synthetic dataset
Nano models and nemotron-H family both rely on a mixture of curate, web-sources and synthetic training data.
Corpora includes general lessons, codes, mathematics, science, legal and financial documents, as well as the alignment-style question-answer dataset.
NVIDIA confirms the use of synthetic region traces generated by other large models to strengthen performance on complex benchmarks.
License and commercial use
Nano -9 B -V2 has been released under model Nvidia Open Model License AgreementThe final was updated in June 2025.
The license is designed to be permissible and enterprise friendly. Nvidia clearly states that there are models Commercially usable Different thinkingAnd That Developers are free to create and distribute derivative models.
Importantly, the Nvidia does not claim ownership of any output generated by the model, leaving responsibility and rights with the developer or organization using it.
For an enterprise developer, it means that the model can be immediately placed in production without interacting on a separate commercial license or using thresholds, revenue levels, or user counts tied to the count. Unlike some of the open licenses used by other providers, the company does not require a payment license after arriving on a certain scale.
He said, the agreement includes many conditions, see enterprises:
- Railings: Users cannot bypass or disable the underlying safety mechanisms (referred to as “railings) without applying comparable replacement to their deployment.
- Rearction: Any redistribution of model or derivatives must include NVDia Open Model License Text and Atribution (“licensed by NVDia Corporation under” NVDia Open Model License “.
- compliance: Users should follow the business rules and restrictions (eg, American export law).
- Trusted AI conditions: Use should align with NVIDIA trusted AI guidelines, which cover responsible deployment and moral ideas.
- Litigation block: If a user starts copyright or patent litigation against another unit accusing a violation by model, the license automatically ends.
These conditions focus on legal and responsible use rather than commercial scale. Enterprises do not need to receive additional permission or Nvidia not only need to pay royalty for the manufacture of products, mudlizing them, or scale to their user base. Instead, they should ensure that the practices respect safety, attention and compliance obligations.
Status in the market
With Nemotron-Nano-9 B-V2, NVDia is targeting developers, which require a balance of logic capacity and deployment efficiency on small parameters.
Runtime budget control and logical features are to give more flexibility to the system builders in the management of accuracy versus reaction speeds.
His release on Hugging Face and Nvidia model catalogs indicates that they are Being widely accessible to experiment and integration means.
NVIDIA Nemotron-9 B-V2’s release language models focus a continuous focus on efficiency and controlling argument.
Mixing hybrid architecture with new compression and training techniquesThe company is offering developers tools that want to maintain accuracy by reducing cost and delay.

