19. Managing Cluster Resources : How to Add and Remove Slave Nodes
 
Share this page                  
How to Add and Remove Slave Nodes
You can add nodes to or remove nodes from a VectorH cluster by using a manual process.
Note:  The master node should not be removed.
Adding or removing nodes requires a full restart and reconfiguration of the cluster, but this typically takes only a few minutes. Individual data partitions, however, are allocated uniquely and evenly to each node based on the original cluster configuration. Any queries run against that data are intended to be executed solely on the node where the data resides. Without repartitioning of data, any additional nodes in the cluster are required to access the data remotely, which may introduce a bottleneck on the system, resulting in sub-optimal performance. This will be particularly acute during "cold" runs immediately after the restart. If the [cbm] bufferpool_size is sufficiently large, performance impact will be negligible once the workload is entirely resident in memory.
Note: The VectorH HDFS block placement policy does not help in this situation, given that it is installed and configured beforehand. It will help, however, when nodes fail and are then restarted or replaced, and only if, after the restart, the number of nodes from the slave set (that is, num_nodes) remains the same. Only in such a situation will the HDFS Block Placement program, upon calling hdfs fsck /path/to/hdfs/data/location ‑blocks ‑locations ‑files, handle the re-replication and co-location of the data that was lost.
Similarly, after removing nodes from the cluster, data that was resident on the removed nodes must be retrieved remotely by the remaining nodes, again leading to a performance degradation. This can be minimized after the initial run by increasing the [cbm] bufferpool_size enough so that the entire workload can still be loaded into memory.
The process for adding or removing nodes in the cluster is as follows:
Note:  This process automatically updates vectorwise.conf, so you may want to make a backup copy of vectorwise.conf prior to performing the process.
Note:  All steps, except Step 4, should be run on the master node.
1. Shut down VectorH and verify all processes have stopped on all nodes.
2. Edit $II_SYSTEM/ingres/files/hdfs/slaves to list all the nodes to be used in the new cluster.
3. If YARN integration is enabled, make sure the following resource in config.dat is set to false. This will prevent the slaves file from being overwritten at startup.
iisetres ii.hostname.x100.yarn.update_slaves false
4. Create the "actian" user and $II_SYSTEM directory on any new nodes.
5. Modify vectorwise.conf on the master node to reflect the new cluster configuration. Set max_parallelism to the total number of cores in the cluster. (You can do this manually or by running iisuhdfs genconfig).
Issue the following command:
iisuhdfs sync
The change in vectorwise.conf is synced to all the slave nodes.
6. Restart VectorH.
7. Repartition existing tables. For example:
CREATE TABLE new_table AS SELECT * FROM old_table WITH PARTITION = (HASH ON key NN PARTITIONS)