The first time I ever touched this amazing (and cheap) network technology called Infiniband, it was a while ago when setting up a back-end storage network (without an IB switch) between two hosts. You could probably see the exploit in my article here – Homelab Storage Network Speedup with …. Infiniband, together with a short video. The HCAs I’m using do have 2 ports 10Gb. You imagine having 10Gigs traffic in a homelab? Quite exciting. So this post’s title is Infiniband in the homelab.
Great, but without a switch, it does not really scale up. Two host cluster is good to start with but who would not want to have 3 hosts today, to play with VSAN for example. That’s why the idea for a cheap Infiniband switch was born. A fellow blogger Erik Bussink gave me some tips which switch to look for and so I took an option with a Topspin 120 Infiniband switch. (Note that this switch is also referenced as Cisco SFS 7000 as Topspin was bought by Cisco a while back I believe ).
It’s a 24 ports switch which looks like this.
Unfortunately, my attempt to change the fans for quieter ones failed. Read the article here. I think that I must get a real 3 pins 40×40 quiet fans. The ones I got had 2 pins and then an adapter which goes to 3 pins, but it’s not good as the switch closely monitors the speed of the fans and obviously the 2 wires fans can’t adapt their speed. You need 3 wires, that’s it.
With Infiniband what’s important is the Subnet Manager. With some switches (and firmware level) you have it in the box, while with others you don’t. My switch came with firmware 2.3, but Erik was kind enough to provide me with an ISO for the Firmware 2.9 release which has the Subnet manager functionality.
Update: (Link to download the 2.9 release here).
Update 2: The switch has finally received a new Noctua fans (Noctua nf-a4x10) which do have 3 wires. But for my particular situation; I had to change the order of those wires.
Shop for vSphere licenses at VMware Store:
- vSphere Essentials Plus – vMotion, HA… 3 Hosts, vCenter
- vSphere Essentials – 3 Hosts, vCenter
- vSphere Standard – Per Physical CPU license
The firmware upgrade is not difficult, but I won’t go into details here as it’s nicely described in the user guide. If you’re in the same situation and have Topspin 120 which needs this Firmware just ask me and I’ll drop it in my Dropbox.
For the upgrade, you need console cable, and then you need a TFTP server installed on your management workstation. I used my laptop. I had to get one of those – FTDI USB to SERIAL / RS232 CONSOLE ROLLOVER CABLE FOR CISCO ROUTERS – RJ45 – as my laptop did not have a serial port. (Do some laptops still have one ???)
The firmware upgrade brings the Subnet manager so, on the vSphere side, you don’t have to use the VIB provided by @hypervisor_fr. In case your switch does not allow you to use the “hardware subnet manager” the VIB would be the only option. You can check the details how to install the VIB in my post Homelab Storage Network Speedup….
So what’s the network setup for the vSphere Storage?
I created a standard switch and changed MTU to 2044, because that’s the maximum which is supported. If your switch supports 4092 just go for it. But make sure that you set the MTU at the vSwitch level AND port group level. At first, I forgot to setup the MTU at the vswitch level and changed it on the port group. vMotion did not work, obviously… But once all in place it works like a charm…
I followed Eric’s post concerning installing the Mellanox Drivers in vSphere 5.5, with the difference that I haven’t used the OpenSM VIB. (have tried the newer, 1.9.9 version of Mellanox drivers, but the HCA card shows up only with single port in the vSphere UI).
Note that you must uninstall the original Mellanox drivers first. The drivers which are “baked in” the vSphere 5.5 original iso. Just follow Eric’s post on that. Here are the most important commands from Erik’s post, which I used:
- unzip mlx4_en-mlnx-184.108.40.206-471530.zip
- esxcli software acceptance set –level=CommunitySupported
- esxcli software vib install -d /tmp/mlx4_en-mlnx-220.127.116.11-offline_bundle-471530.zip –no-sig-check
- esxcli software vib install -d /tmp/MLNX-OFED-ESX-18.104.22.168.zip –no-sig-check
For the moment the HCA cards do have only single IB cable plugged in each as I’m waiting to get more IB cables. For now I can enjoy high speed vMotion traffic going through the 10Gb HCA ports. The plan is to use one port for vMotion traffic and the second port for VSAN traffic. And concerning my VSAN journey, I’m waiting for the customs to to finally get my package with some more hardware parts so I can beef my existing hosts with SSDs and spinning disks.
So the plan is to get VSAN running and I already received a disk controller cards – Dell PERC H310 (an entry level cards which are on the VMware HCL – you can get one for roughly $90 at eBay). Those cards aren’t the most performant, trully not. But they will do the job for the lab. I know that they’re not the fastest ones as especially for VSAN you would possibly chose a one with bigger queue depth, but it’s just a lab which got an upgrade.
When I chose to replace the older boxes I possibly go for something beefier from the RAM and CPU perspective, including IO card which is on the VSAN HCL and has . Possibly Supermicro like Erik Bussink, who knows… Stay tuned via RSS
VSAN in the Homelab – the whole serie:
- My VSAN Journey – Part 1 – The homebrew “node”
- My VSAN journey – Part 2 – How-to delete partitions to prepare disks for VSAN if the disks aren’t clean
- Memory Channel per Bank getting non active when using PERC H310 Contoler card – Fixed!
- Infiniband in the homelab – the missing piece for VMware VSAN – (This post)
- Cisco Topspin 120 – Homelab Infiniband silence
- My VSAN Journey Part 3 – VSAN IO cards – search the VMware HCL
- My VSAN journey – all done!
- How-to Flash Dell Perc H310 with IT Firmware To Change Queue Depth from 25 to 600