Google’s New AI-Focused 'A3' Supercomputer Has 26,000 GPUs (2024)

Cloud providers are building armies of GPUs to provide more AI firepower. At its annual Google I/O developer conference today, Google announced an AI supercomputer with 26,000 GPUs. The Compute Engine A3 supercomputer is one more proof point that it is throwing more resources in an aggressive counteroffensive in its battle for AI supremacy with Microsoft.

The supercomputer has about 26,000 Nvidia H100 Hopper GPUs. For reference, the world’s fastest public supercomputer, Frontier, has 37,000 AMD Instinct 250X GPUs.

“For our largest customers, we can build A3 supercomputers up to 26,000 GPUs in a single cluster and are working to build multiple clusters in our largest regions,” a Google spokeswoman said in an email, adding that “not all of our locations will be scaled to this large size.”

The system was announced at the Google I/O conference, which is being held in Mountain View, California. The developer conference has emerged as a showcase for many of Google’s AI software and hardware capabilities. Google has accelerated its AI development after Microsoft put technologies from OpenAI into Bing search and office productivity applications.

The supercomputer is targeted at customers looking to train large-language models. Google announced the accompanying A3 virtual machine instances for companies looking to use the supercomputer. Many cloud providers are now deploying H100 GPUs, and Nvidia in March launched its own DGX cloud service, which is expensive compared to renting previous generation A100 GPUs.

Google said that the A3 supercomputer is a significant upgrade over compute resources provided by existing A2 virtual machines with Nvidia’s A100 GPUs. Google is pooling all A3 computing instances, which are spread geographically, into a single supercomputer.

“The A3 supercomputer’s scale provides up to 26 exaflops of AI performance, which considerably improves the time and costs for training large ML models,” said Google’s Roy Kim, a director, and Chris Kleban, a product manager, in a blog entry.

The exaflops performance metric, which is used by companies to estimate the raw performance of an AI computer, is still viewed with a pinch of salt by critics. In Google’s case, the flops are meted out in training-targeted TF32Tensor Core performance, which gets you to “exaflops” about 30x faster than the double-precision (FP64) floating point math that most classic HPC applications still require.