Contemplate the GPU. An island of SIMD greatness that makes mild work of matrix math. Initially designed to quickly paint dots on a pc monitor, it was then discovered to be fairly helpful in massive numbers by HPC practitioners. Enter GenAI, and now these little matrix experts are in large demand, a lot in order that we name it the GPU Squeeze.
The well-known and dominant market chief, Nvidia, has charted a lot of the pathway for GPU expertise. For HPC, GenAI, and a raft of different purposes, connecting GPUs supplies a solution to clear up larger issues and enhance your utility’s efficiency.
There are three fundamental methods to “join” GPUs.
1. The PCI Bus: A normal server can often assist 4-8 GPUs throughout the PCI bus. This quantity may be elevated to 32 by utilizing expertise just like the GigaIO FabreX reminiscence cloth. CXL additionally reveals promise nevertheless, Nvidia assist is skinny. For a lot of purposes, these composable GPU domains signify an alternative choice to the GPU-to-GPU scale-up method talked about beneath.
2. Server-to-Server Interconnect: Ethernet or InfiniBand can join servers that comprise GPUs. This connection degree is often referred to as scale-out, the place quicker multi-GPU domains are linked by slower networks to kind massive computational networks. Ethernet has been the workhorse of laptop networking since bits began shifting between machines. Lately, the specification has been pushed to ship excessive efficiency by introducing the Extremely Ethernet Consortium. Certainly, Intel has planted its interconnect flag on the Ethernet hill now that the Intel Gaudi -2 AI processor has 24x 100-Gigabit Ethernet connections on the die.
Absent from the Extremely Ethernet Consortium is Nvidia as a result of they mainly have sole possession of the high-performance InfiniBand interconnect market after they bought Mellanox in March of 2019. The Extremely Ethernet Consortium is designed to be everybody else’s “InfiniBand.” And to be clear Intel used to hold the InfiniBand banner.
3. GPU to GPU Interconnect: Recognizing the necessity for a quick and scalable GPU connection, Nvidia created NVLink, a GPU-to-GPU connection that may at present switch knowledge at 1.8 terabytes per second between GPUs. There may be additionally an NVLink rack-level Swap able to supporting as much as 576 absolutely linked GPUs in a non-blocking compute cloth. GPUs linked by way of NVLink are referred to as “pods” to point they’ve their very own knowledge and computational area.
So far as everybody else, there are not any choices aside from the AMD Infinity Cloth used to attach MI300A APUs. Just like the InfiniBand/Ethernet scenario, some type of “Extremely” consortium of opponents is required to fill the non-Nvidia “pod void.” And that’s simply what has occurred.
AMD, Broadcom, Cisco, Google, Hewlett Packard Enterprise (HPE), Intel, Meta, and Microsoft introduced they’ve aligned to develop a brand new {industry} normal devoted to advancing high-speed and low-latency communication for scale-up AI Accelerators.
Referred to as the Extremely Accelerator Hyperlink (UALink), this preliminary group will outline and set up an open {industry} normal that may allow AI accelerators to speak extra successfully. By creating an interconnect based mostly upon open requirements (learn this as “not Nvidia”), UALink will allow system OEMs, IT professionals, and system integrators to create a pathway for simpler integration, higher flexibility, and scalability of their AI-connected knowledge facilities.
Driving Scale-Up for AI Workloads
Just like NVLink, it’s essential to have a sturdy, low-latency, and environment friendly scale-up community that may simply add computing sources to a single occasion (.i.e., deal with GPUs and accelerators as one massive system or “pod”).
That is the place UALink and an open {industry} specification turn into essential to standardizing the interface for AI and Machine Studying, HPC, and Cloud purposes for the subsequent era of {hardware}. The group will develop a high-speed, low-latency interconnect specification for scale-up communications between accelerators and switches in AI computing pods.
The 1.0 specification will allow the connection of as much as 1,024 accelerators inside an AI computing pod and permit for direct hundreds and shops between the reminiscence connected to accelerators, similar to GPUs, within the pod. The UALink Promoter Group has shaped the UALink Consortium and expects it to be integrated in Q3 of 2024. The 1.0 specification is predicted to be obtainable in Q3 of 2024 and made obtainable to corporations that be part of the Extremely Accelerator Hyperlink (UALink) Consortium.
Competitors Makes for Unusual Bedfellows
The dominance of Nvidia is clearly demonstrated by driving opponents AMD, Intel, and Broadcom to kind a Consortium. Specifically, up to now, Intel has usually taken the “play it alone” technique in terms of new expertise. On this case, the crushing dominance of Nvidia has been the important motivation for all of the Consortium members.
As introduced, the Extremely Accelerator Hyperlink will likely be an open normal. This choice ought to assist deliver it to market quicker as there will likely be much less IP to haggle over, however an optimistic 2026 launch nonetheless appears slightly far off, given the necessity for enormous AI GPU matrix engines yesterday.
In assist of the UALink effort J Metz, Ph.D., Chair of the Extremely Ethernet Consortium (UEC) shared his enthusiasm, “In a really quick time period, the expertise {industry} has embraced challenges that AI and HPC have uncovered. Interconnecting accelerators like GPUs requires a holistic perspective when looking for to enhance efficiencies and efficiency. At UEC, we imagine that UALink’s scale-up method to fixing pod cluster points enhances our personal scale-out protocol, and we’re trying ahead to collaborating collectively on creating an open, ecosystem-friendly, industry-wide resolution that addresses each sorts of wants sooner or later.”
Associated