When configuring VMware High Availability (HA) cluster, you have the possibility to check as a secondary communication channel a datastore (or several ones), during the configuration wizard. VMware Datastore Hearbeating provides an additional option for determining if host is in failed state or not.
Before vSphere 5.0 era, in vSphere 4.1, if host had a hardware problem and failed, or if host was just isolated on its management network, HA would restart the VMs that were running on that particular host. There could be just a problem on a management network, but that was it, the HA would triggered restart of VMs. The VMs would have been restarted on another host in the cluster.
When vSphere 5.0 has been introduced, with the new solution around Fault Domain Manager (FDM) – a Master and Slave architecture, then a more intelligent and advanced technique for host failure has been introduced – Datastore Heartbeat, which adds an additional way of detection for host failures. VMware Datastore Hearbeating brings more resiliency.
In case the Master cannot communicate with a slave (don’t receives the heartbeat), but the heartbeat datastore answers, the server is still working. So if that’s the case, the host is partitioned from the network, or isolated. The Datastore heartbeat function helps greatly to determine the difference between host which failed and host that has just been isolated from others.
vCenter automatically selects at least two datastores from the shared datastores. It’s preferable to have VMware Datastore heartbeating selected on every NAS/SAN device you have. In my example above I have two shared datastores checked, each on different storage device. vCenter gives you the option to specify alternative datastores, but the choice is only from datastores that are mounted by at least two hosts. You can always check later, on the properties of your HA cluster, and see which datastores has been selected and if you have an option to check additional datastores.
The Datastore Hearbeating enables to avoid false restarting of VMs in case only a management network has failed. The default number of heartbeat datastores is two, where the maximum valid value is five. You can override the default value by an advanced attribute: das.heartbeatdsperhost