Testing Master Node Failure

8. Stage 6--Run the Non-functional Tests : Testing Master Node Failure

Share this page

By default, a single master node is used to control access to a cluster. If this node fails, the cluster is unavailable.

To protect against this, a standby master node can be configured, which is generally one of the existing slave nodes. This requires the use of a clustered file system that is not HDFS, and also requires the use of the Red Hat Cluster Suite, and hence requires the use of Red Hat OS.

For instructions for setting this up, see the VectorH User Guide.

To test this after it is set up, the simplest way is to reboot the master node without shutting anything down first. This requires root access, but can be achieved as follows:

sudo reboot -now

You should expect to see that connections to VectorH get terminated but, after a short period, client re-connection attempts will succeed and work can resume.

The period of time taken to switch over depends on factors such as whether there are any transactions in flight that need to be rolled back, and the normal startup time of VectorH. Failover of the master node will be detected by the RHCS heartbeat process in approximately a second, and then the startup of VectorH on the standby node will begin.