The machine learning field is moving rapidly, and yardstics have used the progress of measurement, running to keep it. In the point, a case, Mlperf, two-year-old machine learning competition is sometimes called “AI’s Olympics”, introducing three new benchmark tests, reflecting new directions in the region.
“Recently, it is very difficult to try to follow what happens in the area,” says Miro HodakAMD Engineer and Mlperf Infererance Working Group Co-Speaker. “We see that the models are growing progressively, and in the last two rounds we have introduced the biggest models so far.”
These new benchmarks dealt with the common suspects- Navidia, Arm and Intel. NVIDIA placed the top of the chart, made a new beginning Blackwell ultra GPU packed in GB300 Rack-scal design. AMD performed a strong performance, introduced the latest Mi325x Gpus. Intel proved that anyone can guess the CPU with them Gon Submission, but also entered GPU game with one Intel Arc Pro submit.
New benchmark
In the final stages, Mlperf introduced its largest benchmark yet, a large language model based on LLAMA3.1-403B. In this era, he again topped the top, introducing a benchmark based on the Dipsek R1671B model – the number of parameters of the last largest benchmarks more than 1.5 times.
As a logic model, the Dipsek R1 passes through several stages of the chain-off-three when a query comes close to a query. This means that most of the calculation is in the normal LLM operation, making this benchmark even more challenging. The logic model is claimed to be the most accurate, making them the technique of choice for science, mathematics and complex programming query.
In addition to the largest LLM benchmark ever, MLPERF also introduced the smallest based on LLAMA3.1-8B. The demand for industry for low delay is increasing, yet high-compatibility logic, taunted Iyengar, MLPERF Infererance Task Force explained. Small LLMs can supply it, and are an excellent choice for tasks such as lessons and edge applications.
This brings the total count of the LLM-based benchmark into a misleading four. They include new, smallest LLAMA3.1-8B benchmarks; A pre-existing LLAMA2-70B benchmark; LLAMA3.1-403B introduction of the last round of benchmark; And the largest, new Deepsek R1 model. If nothing else, this signal LLM is not going anywhere.
In addition to innumerable LLMS, this era of Mlperf Infererance included a new voice-to-text model based on the Whisper-Large-V3. This benchmark is a reaction to the increasing number of voice-competent applications, whether it is a smart device or speech-based AI interface.
Themlperf Infererance competition has two broad categories: “closed”, which needs to be used as reference nerve network models, without modifications, and “open”, where some modifications in the model are allowed. Within those people, there are many subclasses related to how tests are done and what kind of infrastructure. We will focus on “closed” datastery server results for purity.
Nvidia leads
Surprisingly, at least in the ‘server’ category, the best performance per accelerator at each benchmark was obtained by an NVidia GPU-based system. NVIDIA also unveiled Blackwell Ultra, top of the chart in the two largest benchmarks: lllama3.1-405B and Deepsek R1 logic.
Blackwell ultra Blackwell is a more powerful recurrence of architecture, which has much greater memory capacity, double the acceleration for attention layers, 1.5x more AI calculations, and rapid memory and connectivity than standard blackwell. It means for large AI workloads, such as tested at two benchmarks.
In addition to hardware reforms, director of quick computing products in Nvidia Dave salveter Credit of Blackwell Ultra’s success for two major changes. First, Nvidia’s ownership 4-bit floating point number format using, Nvfp4“We can provide comparable accuracy in formats like BF16,” Salvetors say, while using very low computing power.
Second is so -called Dissected serviceThe idea behind the disagreeful service is that the estimates are the two main parts of the charge: the prefil, where the querry (“Please submit this report briefly.”) And its entire reference window (report) is loaded into LLM, and generation/decoding, where the output is actually calculated. There are different requirements in both these stages. While prefils are calculated heavy, the generation/decoding memory is highly dependent on the bandwidth. Salvetors say that NVidia receives a performance benefit of about 50 percent, by handing over to different groups of GPU in two different stages.
AMD closed back
The latest accelerator chip of AMD was launched in July. The company offered results in the “open” category only, where software amendments in the model are allowed. Like Blackwell Ultra, the Mi355x has an expansion of 4-bit floating point support, as well as high-bandwidth memory. MI355x defeated its predecessor, MI325x, open by a factor of 2.7 in llama2.1-70B benchmark, defeated Mi325x, says, says Mahesh BalasubramanianSenior Director of Data Center GPU Product Marketing at AMD.
The “off” submission of AMD included systems operated by AMD MI300X and Mi325X GPU. The more advanced Mi325x computers performed equally with a mixture of experts, and image generation benchmarks, built with NVIDIA H200s at lllama2-70B.
The first hybrid submission was also included in this era, where both AMD MI300X and Mi325X GPU were used for the same estimate function, LLAMA2-70B benchmark. Use of hybrid GPU is important, as new GPUs are coming Annual rhythmAnd the old models, N-Massey, are not going anywhere. A capable of spreading workloads between different types of GPU is an essential step.
Intel enters GPU game
In the past, Intel remains firm that no one needs a GPU to do the machine learning. In fact, the submission using Intel’s Xeon CPU still performed at the conducive with the NVidia L4 on the object detection benchmark, but implicated the recommended system on the benchmark.
This round, for the first time, an Intel GPU also performed a demonstration. Intel Arc Pro Was first released in 2022. MLPERF submission was a graphics card which was named Maxsun Intel Arc Pro B60 Dual 48 G Turbo Which includes two GPUs and 48 gigabytes memory. The system performed at the small LLM benchmark as well as the L40s of NVIDIA and implicated it on the LLAMA2-70B benchmark.
From your site articles
Related articles around web
