with AMD cozy.
- It turned out that Huang cooked eight A100 GPUs, two Epyc 7742 64-core CPUs, nine Mellanox interconnects, and various special offers such as RAM and SSDs. Mmmm, just like grandma used to do.
- When we look at an A100 GPU without its massive heat sink, we feel a little uncomfortable.
- This fully assembled DGX A100 node looks strangely lonely, with no racks, wires, or flashing lights.
Yesterday at the Nvidia GPU technology conference, everyone found out what CEO Jensen Huang is cooking – an amp successor to the Volta-based DGX-2 deep learning system.
On Wednesday, we described mysterious hardware in Huang’s kitchen as likely “packing some Xeon CPUs” in addition to the new successor to the Tesla v100 GPU. That’s our face – the new system includes a pair of AMD Epyc 7742 64-core CPUs with 128 threads and 1 TB of RAM, a pair of 1.9 TB NVMe SSDs in RAID1 for a boot drive and up to four 3.8TiB PCIe4.0 NVMe drives in RAID 0 as secondary storage.
Goodbye Intel, hello AMD
- It is very likely that Nvidia did not want to financially support Intel’s plans to gain a foothold on its own profitable deep learning lawn.
- At least the Intel DG1 exists – but its leaked benchmarks don’t seem to be impressive, suggesting performance similar to years of Nvidia GPUs.
- We’ve heard rumors of a Xe DG2, but all we’ve seen so far is concept renderings, not real hardware.
Technically, it shouldn’t be too surprising that Nvidia would use AMD for the CPUs in its flagship machine learning nodes – Epyc Rome has brought Intel’s Xeon server CPU range to its knees for quite some time while now. On the technical side, Epyc 7742 support for PCIe 4.0 may have been more important than the high CPU speed and a massive number of cores/threads.
GPU-based machine learning often leads to memory bottlenecks, not CPU bottlenecks. The M.2 and U.2 interfaces used by the DGX A100 each use 4 PCIe lanes. This means that the change from PCI Express 3.0 to PCI Express 4.0 doubles the available storage transport bandwidth from 32 Gbit / s to 64 Gbit / s per single SSD.
There might be a little politics lurking behind the decision to switch CPU providers as well. AMD may be Nvidia’s biggest competitor in the relatively low-margin consumer graphics market, but Intel is active on the data center side of the market. At the moment, Intel’s offerings for discrete GPUs are mostly steam – but we know that Chipzilla has much bigger and bigger plans as it shifts its focus from the dying consumer CPU market to all data centers.
The Intel DG1 itself – the only real hardware we’ve seen so far – leaked benchmarks that rival the Ryzen 7 4800U’s built-in Vega GPU. However, Nvidia could be more concerned about the Xe HP 4-tile GPU, whose 2048 EUs (execution units) could offer up to 36 FLOPS – which would be at least in the same range as the Nvidia A100 GPU for the DGX presented today.
DGX, HGX, SuperPOD and Jetson
The DGX A100 was the star of today’s announcements – it’s a standalone system with eight A100 GPUs and 40 GB of GPU memory each. The US Department of Energy’s Argonne National Lab is already using a DGX A100 for COVID-19 research. The system’s nine 200-Gbps Mellanox connections allow multiple DGX-A100s to be clustered – but those whose budget doesn’t support many $ 200,000 GPU nodes can do so by putting the A100 GPUs in up to Split 56 instances per piece.
For those who have the budget to buy and group tons of DGX A100 nodes, they are also available in a Hyperscale Data Center Accelerator (HGX) format. According to Nvidia, a “typical cloud cluster” consisting of previous DGX-1 nodes and 600 separate CPUs for inference training could be replaced by five DGX A100 units that can handle both workloads. This would reduce the hardware from 25 racks to one, the power budget from 630 kW to 28 kW, and the cost from $ 11 million to $ 1 million.
If the HGX still doesn’t sound big enough, Nvidia has also released a reference architecture for its SuperPOD – no relationship with Plume. Nvidia’s A100 SuperPOD connects 140 DGX A100 nodes and 4PB flash memory through 170 Infiniband switches and offers 700 petaflops with AI performance. Nvidia has added four SuperPODs to its own SaturnV supercomputer, making SaturnV the fastest AI supercomputer in the world, at least according to Nvidia.
If the data center is not your thing, you can use the Jetson EGX A100 instead to have an A100 in your edge computing. For those who are not familiar, Nvidia’s Jetson single board platform can be considered as a Raspberry Pi on steroids. They can be used in IoT scenarios, but offer significant processing power for a small form factor that is robust and in edge devices such as robotics, healthcare, and drones.