Tuesday, May 20, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

Nvidia Blackwell Leads AI Inference, AMD Challenges

Simon Osuji by Simon Osuji
April 2, 2025
in Artificial Intelligence
0
Nvidia Blackwell Leads AI Inference, AMD Challenges
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter


In the latest round of machine learning benchmark results from MLCommons, computers built around Nvidia’s new Blackwell GPU architecture outperformed all others. But AMD’s latest spin on its Instinct GPUs, the MI325, proved a match for the Nvidia H200, the product it was meant to counter. The comparable results were mostly on tests of one of the smaller-scale large language models Llama2 70B (for 70 billion parameters). However, in an effort to keep up with a rapidly changing AI landscape, MLPerf added three new benchmarks to better reflect where machine learning is headed.

MLPerf runs benchmarking for machine learning systems in an effort to provide an apples-to-apples comparison between computer systems. Submitters use their own software and hardware, but the underlying neural networks must be the same. There are a total of 11 benchmarks for servers now, with three added this year.

It has been “hard to keep up with the rapid development of the field,” says Miro Hodak, the co-chair of MLPerf Inference. ChatGPT only appeared in late 2022, OpenAI unveiled its first large language model (LLM) that can reason through tasks last September, and LLMs have grown exponentially—GPT3 had 175 billion parameters, while GPT4 is thought to have nearly 2 trillion. As a result of the breakneck innovation, “we’ve increased the pace of getting new benchmarks into the field,” says Hodak.

The new benchmarks include two LLMs. The popular and relatively compact Llama2-70B is already an established MLPerf benchmark, but the consortium wanted something that mimicked the responsiveness people are expecting of chatbots today. So the new benchmark “Llama2-70B Interactive” tightens the requirements. Computers must produce at least 25 tokens per second under any circumstance and cannot take more than 450 milliseconds to begin an answer.

Seeing the rise of “agentic AI”—networks that can reason through complex tasks—MLPerf sought to test an LLM that would have some of the characteristics needed for that. They chose Llama3.1 405B for the job. That LLM has what’s called a wide context window. That’s a measure of how much information—documents, samples of code, etc.—it can take in at once. For Llama3.1 405B that’s 128,000 tokens, more than 30 times as much as Llama2 70B.

The final new benchmark, called RGAT, is what’s called a graph attention network. It acts to classify information in a network. For example, the dataset used to test RGAT consist of scientific papers, which all have relationships between authors, institutions, and fields of studies, making up 2 terabytes of data. RGAT must classify the papers into just under 3000 topics.

Blackwell, Instinct Results

scatter visualization

Nvidia continued its domination of MLPerf benchmarks through its own submissions and those of some 15 partners such as Dell, Google, and Supermicro. Both its first and second generation Hopper architecture GPUs—the H100 and the memory-enhanced H200—made strong showings. “We were able to get another 60 percent performance over the last year” from Hopper, which went into production in 2022, says Dave Salvator, director of accelerated computing products at Nvidia. “It still has some headroom in terms of performance.”

But it was Nvidia’s Blackwell architecture GPU, the B200, that really dominated. “The only thing faster than Hopper is Blackwell,” says Salvator. The B200 packs in 36 percent more high-bandwidth memory than the H200, but more importantly it can perform key machine-learning math using numbers with a precision as low as 4 bits instead of the 8 bits Hopper pioneered. Lower precision compute units are smaller, so more fit on the GPU, which leads to faster AI computing.

In the Llama3.1 405B benchmark, an eight-B200 system from Supermicro delivered nearly four times the tokens per second of an eight-H200 system by Cisco. And the same Supermicro system was three times as fast as the quickest H200 computer at the interactive version of Llama2-70B.

Nvidia used its combination of Blackwell GPUs and Grace CPU, called GB200, to demonstrate how well its NVL72 data links can integrate multiple servers in a rack, so they perform as if they were one giant GPU. In an unverified result the company shared with reporters, a full rack of GB200-based computers delivers 869,200 tokens/s on Llama2 70B. The fastest system reported in this round of MLPerf was an Nvidia B200 server that delivered 98,443 tokens/s.

AMDis positioning its latest Instinct GPU, the MI325X, as providing competitive performance to Nvidia’s H200. MI325X has the same architecture as its predecessor MI300 but adds even more high-bandwidth memory and memory bandwidth—288 gigabytes and 6 terabytes per second (a 50 percent and 13 percent boost respectively).

Adding more memory is a play to handle larger and larger LLMs. “Larger models are able to take advantage of these GPUs because the model can fit in a single GPU or a single server,” says Mahesh Balasubramanian, director of data center GPU marketing at AMD. “So you don’t have to have that communication overhead of going from one GPU to another GPU or one server to another server. When you take out those communications your latency improves quite a bit.” AMD was able to take advantage of the extra memory through software optimization to boost the inference speed of DeepSeek-R1 8-fold.

On the Llama2-70B test, an eight-GPU MI325X computers came within 3 to 7 percent the speed of a similarly tricked out H200-based system. And on image generation the MI325X system was within 10 percent of the Nvidia H200 computer.

AMD’s other noteworthy mark this round was from its partner, Mangoboost, which showed nearly four-fold performance on the Llama2-70B test by doing the computation across four computers.

Intel has historically put forth CPU-only systems in the inference competition to show that for some workloads you don’t really need a GPU. This time around saw the first data from Intel’s Xeon 6 chips, which were formerly known as Granite Rapids and are made using Intel’s 3-nanometer process. At 40,285 samples per second, the best image recognition results for a dual-Xeon 6 computer was about one-third the performance of a Cisco computer with two Nvidia H100s.

Compared to Xeon 5 results from October 2024, the new CPU provides about an 80 percent boost on that benchmark and an even bigger boost on object detection and medical imaging. Since it first started submitting Xeon results in 2021 (the Xeon 3), the company has achieve an 11-fold boost in performance on Resnet.

For now, it seems Intel has quit the field in the AI accelerator chip battle. Its alternative to the Nvidia H100, Gaudi 3, did not make an appearance in the new MLPerf results nor in version 4.1, released last October. Gaudi 3 got a later than planned release because its software was not ready. In the opening remarks at Intel Vision 2025, the company’s invite-only customer conference, newly minted CEO Lip Bu Tan seemed to apologize for Intel’s AI efforts. “I’m not happy with our current position,” he told attendees. “You’re not happy either. I hear you loud and clear. We are working toward a competitive system. It won’t happen overnight, but we will get there for you.”

Google’sTPU v6e chip also made a showing, though the results were restricted only to the image generation task. At 5.48 queries per second, the 4-TPU system saw a 2.5x boost over a similar computer using its predecessor TPU v5e in the October 2024 results. Even so, 5.48 queries per second was roughly in line with a similarly-sized Lenovo computer using Nvidia H100s.

From Your Site Articles

Related Articles Around the Web



Source link

Related posts

Everything Google Announced at I/O 2025

Everything Google Announced at I/O 2025

May 20, 2025
New method for energy-aware deployment planning of delivery drones

New method for energy-aware deployment planning of delivery drones

May 20, 2025
Previous Post

Indian Navy seizes 2.5 tons of drugs in the western Indian Ocean

Next Post

NNPC welcomes new GCEO, board, economist lauds Tinubu – EnviroNews

Next Post
NNPC welcomes new GCEO, board, economist lauds Tinubu – EnviroNews

NNPC welcomes new GCEO, board, economist lauds Tinubu - EnviroNews

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

eZaga’s Impressive Results: Over 98% of Students Benefit from NSFAS Monthly Payments – IT News Africa

eZaga’s Impressive Results: Over 98% of Students Benefit from NSFAS Monthly Payments – IT News Africa

2 years ago
Bitcoin ‘extreme greed’ ends matching sentiment from ETF launch at $46k

Bitcoin ‘extreme greed’ ends matching sentiment from ETF launch at $46k

1 year ago
Lessons from Harlem: The Continuous Fight for Racial Equity in Healthcare

Lessons from Harlem: The Continuous Fight for Racial Equity in Healthcare

2 years ago
Top 5 major African cities where most residents can afford housing in 2025

Top 5 major African cities where most residents can afford housing in 2025

3 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.