Infiniband Fat Trees

Overview

Infiniband is primarily used in High Performance Computing (HPC) and provides a very fast network interconnect with an incredibly small latency. The Infiniband Iterface cards also implement Remote Direct Memory Access (RDMA) which allows the cards to place data directly into the memory of the end node so that no Kernel intervention is required.

When we took on our HPC Re-engineering Contract we were exposed to FDR Infiniband from Mellanox which was already old technology and in much need of an upgrade. Since FDH was released its been replaced with EDR (100GBs) and that has been replaced with HDR (200GBs).

Topology Overview

There are lots of different topologies available for connecting High Performance Compute nodes, here we will look at an Infiniband Fat Tree Network. This type of network topology consist of Master Switches and Leaf Switches. Servers, Storage and other “End Nodes” connect to the Leaf Switches. There are no Master to Master links and no Leaf to Leaf links. All up links pass through a Master.

108 Node Configuration

For Non-Blocking operation (which is best to have), a 108 node configuration has 6 Leaf Switches and 3 Master Switches.

With Non-Blocking Modes half the Leaf Switch ports are define as up links, so 18 ports are used to connect to end nodes. The first 18 ports consist of 3 groups of six ports (3 masters so 3 groups).

Connecting Hosts

Hosts connect to a leaf switch, with 18 uplinks, that leaves 18 hosts per switch, hence 108 hosts across 6 switches.

72 Node Configuration

With a 108 node (non-blocking) configuration needing 9 switches, a smaller 72 node network can be built with 6 switches.

The 72 node non-blocking configuration requires 2 Master Switches and 4 Leaf Switches, like the larger designs, the first 18 ports are defined as up links. As there are 2 masters then 9 links go to each master.

This article has primarily focused on non-blocking modes in order to achieve high performance operation. It is possible to move to less up links and hence free up ports for more end nodes. So ratios of 2:1 or more are available.

Visit the Melanox Web Site to obtain more information on alternate topolgies.

-oOo-

You may also like...

Popular Posts