Another new feature in vSphere 5 is the way it's handled the HA process.
There is no more AAM agent like in vSphere 4.1. Instead, there has been a new agent introduced which is named FDM – Fault Domain Manager. The Primary/Secondary concept with 5 primary nodes which has been known in vSphere 4, is gone. You no longer needs to worry not to loose all those 5 primary nodes at the same time …. and loose the HA functionality for the rest of the cluster. Now there is only one agent in the cluster which plays the role of Master. The agent is called FDM – Fault Domain Manager. One host takes the role of Master. The other agents on other hosts plays only roles as a Slaves, and can became Masters in case the master fails.
The master monitors the availability of ESXi 5 hosts and also the VM availability. The master agent also monitors all slave hosts and in case this slave host fails, all VMs on that host are restarted on another host. Within each individual host the status of each protected VM is monitored and if a failure of that protected VMs happens, the master proceeds with the restart of this VM. The FDM master keeps a list of VMs being protected, which is updated after every power off or power on status initiated by user. FDM master keeps track of all hosts being a members of a cluster, any adding/removing of hosts refresh this list as well.
Now you might be thinking, what if… the master fails. In that case, there is a re-ellection process (this was not the case in vSphere 4) and the host which has an access to the greatest number of datastores is elected as a master. You might be thinking why that? It's because the secondary communication channel is through datastores. There are other considerations for a Slave to became elected as a Master as well.
The hosts with slave roles maintain a direct point-to-point TCP connection (no broadcasts) which is encrypted, with the Master. The election process is done via UDP, and then again only via SSL encrypted TCP the communication between the Master and the slaves are maintained.
The host with the master role sends periodically reports states to vCenter. The slaves are informed that the Master is alive via heartbeats. The slaves monitors the state of their locally running VMs and any changes are transmitted to Master. The Slave sends a heartbeats to master and if master should fail, the re-election process occurs. vCenter knows if a new Master is elected, because it's the new master which contacts vCenter after the re-election process is finished.
The secondary channel through datastores is known as a Heartbeat Datastores. But this secondary network is not used in normal situations, only in case the primary network goes down. This secondary channel permits the Master to be aware of all Slave hosts and also the VMs running on those hosts. The Heartbeat datastores can also determine if host became isolated or network partitioned. The secondary channel can determine if host is failed (PSOD) or if it's just isolated.
And as I could read elsewhere, to configure HA you'll need at least 2 shared datastores …
A quick quote from Chad Sakac's blog:
The other major change is the use of BOTH networking AND storage as a mode for communication and maintaining state. This is likely the first thing people will see that has them saying “huh?” – during the vSphere beta, it did for me, as you need to have 2 shared datastores to configure VM HA – and the first time I saw that I knew something had changed.
One more thing: HA no longer uses DNS – it means there is no dependency on DNS or hosts files..
Update: A quick quote from Uptime blog concerning DNS:
Ever had DNS resolution cause you issues when using vSphere HA? With 5.0, all dependency on DNS for vSphere HA has been removed!
Source: Slideshare Presentation by Eric Sloof – The Master.. -:)