SAN FRANCISCO, March 27 (Reuters) - Artificial
intelligence benchmarking group MLCommons on Wednesday released
a fresh set of tests and results that rate the speed at which
top-of-the-line hardware can run AI applications and respond to
users.
The two new benchmarks added by MLCommons measure the speed
at which the AI chips and systems can generate responses from
the powerful AI models packed with data. The results roughly
demonstrate to how quickly an AI application such as ChatGPT can
deliver a response to a user query.
One of the new benchmarks added the capability to measure
the speediness of a question-and-answer scenario for large
language models. Called Llama 2, it includes 70 billion
parameters and was developed by Meta Platforms ( META ).
MLCommons officials also added a second text-to-image
generator to the suite of benchmarking tools, called MLPerf,
based on Stability AI's Stable Diffusion XL model.
Servers powered by Nvidia's ( NVDA ) H100 chips built by the likes of
Alphabet's Google, Supermicro and Nvidia ( NVDA )
itself handily won both new benchmarks on raw
performance. Several server builders submitted designs based on
the company's less powerful L40S chip.
Server builder Krai submitted a design for the image
generation benchmark with a Qualcomm AI chip that draws
significant less power than Nvidia's ( NVDA ) cutting edge processors.
Intel ( INTC ) also submitted a design based on its Gaudi2
accelerator chips. The company described the results as "solid."
Raw performance is not the only measure that is critical
when deploying AI applications. Advanced AI chips suck up
enormous amounts of energy and one of the most significant
challenges for AI companies is deploying chip that deliver an
optimal amount of performance for a minimal amount of energy.
MLCommons has a separate benchmark category for measuring
power consumption.