Enhancing Data Movement for GPUs with NVIDIA GPUDirect RDMA Technology – HPCwire

Whether you’re exploring mountains of data, researching scientific problems, training neural networks, or modeling financial markets, you need a computing platform that boasts the highest data throughput. And, because GPUs consume data much faster than CPUs, you also need additional bandwidth, while maintaining low latency,

InfiniBand is the ideal network for feeding and scaling the data-hungry GPUs you need to maintain efficiency in the AI Exascale Era, which is already in full-swing in data centers around the globe.

InfiniBand remains the most widely adopted high performance network technology in the world for high performance computing (HPC) and has grown over the years to also become the most widely adopted high speed network deployed in all areas of AI. This includes those used for advanced research, development, and critical business deployments that are fully embracing the latest NVIDIA A100 Ampere Architecture.

GPUDirect RDMA: Direct Communication Between NVIDIA GPUs

InfiniBand’s remote direct memory access (RDMA) engines can be leveraged to enable direct access to GPU memory. Designed specifically for the needs of GPU acceleration, GPUDirect RDMA provides a direct communication path between NVIDIA GPUs in remote systems using InfiniBand. This eliminates the system CPUs and the required buffer copies of data via the system memory, resulting in superior performance.

 Figure 1: Block diagram of NVIDIA GPUDirect RDMA connectivity.

Over the past several years, hardware technology has improved dramatically. For example, InfiniBand has moved to 400Gb/s, we’ve seen the transition to PCIe Gen-4, and GPUs are crunching data more than 20 times faster. Yet, there has been one constant: NVIDIA technology has improved consistently, keeping pace across generations of software.

GDRCopy: Fast Copy Library

GPUDirect RDMA also benefitted from a performance boost with GDRCopy, a low-latency fast copy library based on NVIDIA GPUDirect RDMA technology. While GPUDirect RDMA is meant for direct access to GPU memory from the network, it’s possible to use these same APIs to create perfectly valid CPU mappings of the GPU memory. CPU-driven copy only requires a small amount of overhead and greatly enhances performance.

Today’s modern communication libraries like NVIDIA HPC-X, Open MPI, and MVAPICH2 can easily take advantage of GPUDirect RDMA and GDRCopy to exploit the lowest latency and highest bandwidth when moving data between the unprecedented acceleration capabilities of the NVIDIA A100 GPUs.

A Test Drive at the HPC-AI Advisory Council Performance Center

Recently, we took the latest versions of these supported libraries for a test drive on the Tessa cluster, which was just introduced at the HPC-AI Advisory Council. The HPC-AI Advisory Council High Performance Center offers an environment for developing, testing, benchmarking, and optimizing products that are based on clustering technology. The Tessa cluster is somewhat unique, in that it’s fully equipped with NVIDIA A100 PCIe 40GB GPUs and populated with ConnectX-6 HDR InfiniBand adapters, running on very flexible servers from Colfax International, the CX41060t-XK7, a PCIe-3 based platform.  While this is not common for a PCIe-3 configuration, running at HDR 200Gb/s InfiniBand instead of HDR100, it certainly squeezes every ounce of performance from the platform.

click to enlarge
click to enlarge
click to enlarge

Figure 2: GPUDirect + GDRCopy performance on HPC-AI Advisory Council “Tessa” cluster with NVIDIA HPC-X

Accelerating the Most Important Work of Our Time

The combination of NVIDIA MagnumIO™, InfiniBand, and A100 Tensor Core GPUs delivers unmatched acceleration across the spectrum of research, scientific computing, and industry. To learn more about how NVIDIA is accelerating the world’s highest performing data centers for AI, data analytics, and HPC applications, review the resources below to get started:

Access this tutorial for a complete walk-through of GPUDirect RDMA and GDRCopy: https://hpcadvisorycouncil.atlassian.net/wiki/spaces/HPCWORKS/pages/2791440385/GPUDirect+Benchmarking

Explore more resources on GPUDirect RDMA:



NVIDIA GPUDirect RDMA Solution Overview

Get more information on GDRCopy: https://developer.nvidia.com/gdrcopy

Read more about NVIDIA Magnum IO, the IO subsystem of the modern data center: https://www.nvidia.com/en-us/data-center/magnum-io/

Scot Schultz | Sr. Director, HPC and Technical Computing | NVIDIA

Scot Schultz is an HPC technology specialist with a focus on artificial intelligence and machine learning systems. Schultz has broad knowledge in distributed computing, operating systems, AI frameworks, high speed interconnects, and processor technologies. Throughout his career, with more than 25 years of experience in high performance computing systems, his responsibilities have included various engineering and leadership roles, including strategic HPC technology ecosystem enablement. Scot has been instrumental in the growth and development of numerous industry-standards organizations.

Source Link