Installation
A functioning VectorH environment requires the following:
• A Linux account that will be the user who owns the installation that has passwordless SSH access between the cluster nodes
• Linux file system space owned by the user who owns the installation for software and configuration files
• A functioning and supported Hadoop environment
• A location in HDFS owned by the installation owner for data
• A high speed interconnect between data nodes. 10Gb Ethernet is recommended; 1Gb Ethernet will work but performance is likely to be impaired.
If the installation process can be done with privileged access (root or sudo, for example), then the installer will automatically perform all the required setup. If the installation process is to be done without privileged access, then the steps requiring administration operations must be done first: creation of user accounts, setup of passwordless SSH access, creation and ownership setting of Linux file system directories, creation and ownership setting of the HDFS location.
The installer creates a VectorH “instance” that is designated by an instance ID. The default instance ID for the first instance of VectorH on a cluster is VH. The other installation defaults are: actian as the user who owns the installation, /opt/Actian/VectorVH as the Linux file system location, and /Actian/VectorVH as the HDFS location.
This section describes common installation scenarios. For full installation instructions, see the Getting Started guide.
Recommended Hadoop Settings
We recommend the following Hadoop settings:
• dfs.datanode.max.transfer.threads: 4096 or higher. Follow the Hadoop vendor recommendations, if higher.
• dfs.replication: Less than the number of VectorH nodes. As of VectorH 4.2.2, the [cbm] hdfs_replication configuration setting can be used instead.
If you want VectorH to integrate with YARN:
• ipc.client.connect.max.retries: 3
• ipc.client.max.retires.on.timeouts: 3
• yarn.nm.liveness-monitor.expiry-interval-ms: 10000
• yarn.client.nodemanager-connect-max-wait-ms: 50000
• yarn.client.nodemanager-connect-retry-interval-ms: 10000
• yarn.resourcemanager.system-metrics-publisher.enabled: false
• yarn.am.liveness-monitor.expiry-interval-ms: 10000
• yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
• If the yarn-site.xml file contains the property “yarn.nodemanager.remote-app-log-dir: hdfs://var/...”, you must add the NameNode into the hdfs URI:
yarn.nodemanager.remote-app-log-dir: hdfs://your_name_node/var/...
• Add the following to the yarn-site.xml file if it is missing:
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Pre-installation Tasks
Prior to installation, check that the following prerequisites are met for all nodes:
• Supported version of Linux
• Installed Linux libraries for OpenSSH, rsync, and libaio
• Supported and installed version of Hadoop and Java
These requirements are documented in more detail in the
Getting Started guide, the readme file on ESD (
http://esd.actian.com), and the Product Availability Matrix (
http://downloads.actian.com/media/pdfs/product_avail_matrix_vector.pdf).
Installing with root Access
Prerequisites: Software has been extracted. Evaluation key is placed in a file named authstring in the directory where the installation software has been extracted to.
Default Installation: Installation owner (user) is actian, instance ID is VH, Linux directory is /opt/Actian/VectorVH/, HDFS location is /Actian/VectorVH.
Issue the following command:
./install.sh
Install as Specific User: Install as the user actian2.
./install.sh -user actian2
Install in a Specific Directory: Install with user actian3 with software placed in the home directory.
./install.sh -user actian3 me/actian3/VectorVH
Install with a Specific Instance ID: Install with user actian3 with software placed in the home directory using V2 as the instance ID.
./install.sh -user actian3 /home/actian3/VectorV2 V2
Which DataNodes to Install VectorH On?
By default, VectorH is installed on all the active DataNodes in a Hadoop cluster, and their Fully Qualified Domain Names (FQDN) are automatically recorded in the slaves file in $II_SYSTEM/ingres/files/hdfs.
If you do not want to install VectorH on all the active DataNodes, follow this procedure:
To control the number of DataNodes used by VectorH
1. When you are prompted during the installation whether you want to set up the DataNodes, enter “n” for no.
The install process stops.
2. Edit the slaves file in $II_SYSTEM/ingres/files/hdfs to list the DataNodes you want to use.
3. Continue the installation by using the following command:
iisuhdfs datanodes
The slave nodes are set up and the installation is complete.
Installing with sudo Access
Installing with the ‑usesudo flag allows the installation to be carried out by a user other than root and still have the install script perform all the required tasks. Providing the user has the ability to use sudo on all nodes, then the installation process can be carried out in the same way as when using root.
Pre-requisites: As per root based installs with the additional requirements that the user performing the install exists on all nodes, has a password, and can use sudo.
Example sudo Installation: Use defaults of instance ID VH, user actian, installation directory /opt/Actian/VectorVH, HDFS location /Actian/VectorVH.
Issue the following command:
./install.sh -usesudo
Installing Without Privileged Access
If the installation must be performed with an unprivileged account, then the steps that require administrative operations must be done before the installation process starts.
Pre-requisites: As per root based installs with the additional requirements: the installation user exists on all nodes, passwordless SSH access between the master node and the slave nodes is already setup, the Linux and HDFS directories exist and are owned by the installation owner.
Example Noroot/Nosudo Installation: Use defaults of installation VH, user actian, installation directory /opt/Actian/VectorVH, HDFS location /Actian/VectorVH.
# Create user (repeat on all nodes):
useradd actian
passwd actian
# On master nodesetup passwordless ssh:
su - actian
ssh-keygen
ssh-copy-id actian@[Fully qualified hostname] # Repeat for all hosts
exit
# Setup HDFS location:
su hdfs
hdfs dfs -mkdir /Actian
hdfs dfs -mkdir /Actian/VectorVH
hdfs dfs -chown actian /Actian
hdfs dfs -chown actian /Actian/VectorVH
hdfs dfs -chmod 755 /Actian
hdfs dfs -chmod 755 /Actian/VectorVH
exit
# On all nodes setup Linux file system directories:
mkdir -p /opt/Actian/VectorVH
chown actian /opt/Actian/VectorVH
# Run install
su - actian
cd [Directory where install.sh is located]
./install.sh -noroot
Because the Hadoop slaves information may not be visible to the user performing the installation, it may be necessary to complete the slave node installation as a separate step. If the installer does not prompt for information about the slave nodes, then also execute these commands:
# Setup the required environment variables and stop the current instance
. ~/.ingVHsh
ingstop
# Edit the slave nodes list and add the required nodes:
vi $II_SYSTEM/ingres/files/hdfs/slaves
# Perform setup of slave nodes and restart the instance
iisuhdfs datanodes
ingstart
Note: The
iisuhdfs datanodes command can also be used to make modifications to an existing installation. Running the command more than once does not harm the installation; the command, however, will reset the installation parallelism settings based on the number of nodes active in the system. If you have made changes to the default values (as described in
Configuring Database Resources), then the configuration file may need to be modified after running
iisuhdfs datanodes.
Installing Using a Response File
The installation script supports supplying parameters through a response file. This can be used to make installation easily repeatable or even completely automated.
The most minimal response file requires only the time zone. You can also list slave nodes to install VectorH on. For example:
II_TIMEZONE_NAME=GMT
SLAVES_LIST=VectorH-HW1.localdomain,VectorH-HW2.localdomain
The installation can then be performed by supplying this file to the installer for example:
install.sh -respfile respfile.txt -acceptlicense -noroot
For a complete list of response file parameters, see the Getting Started guide.
How to Install on a Kerberos-enabled Cluster
If Kerberos is enabled in the Hadoop cluster, then a Kerberos principal and a keytab file are required to perform the installation. You will need a renewable Kerberos ticket for the actian user and hdfs user.
If doing a privileged install (using root or sudo), then the installer becomes the hdfs user as part of the installation process and so the hdfs user also requires a Kerberos ticket.
Follow this process when installing on a Kerberos-enabled cluster:
1. Set up the Kerberos principal for the Actian software. For example:
actian@YOURREALM
2. Extract the keytab file for Actian.
3. Obtain a ticket for the actian user. For example:
kinit actian@YOURREALM –k –t /.../actian.headless.keytab.
4. If needed, also ensure that the hdfs user has a Kerberos ticket using a command similar to above as the hdfs user. For example:
sudo -u hdfs kinit hdfs...
5. Start the installer.
The installer will detect that the cluster has Kerberos enabled, check that required tickets are in place, and prompt for the required Kerberos details (principal name and keytab file).
If using a response file to perform the install, then add the required parameters (KRB5_PRINCIPAL and KRB5_KEYTAB_FILE) to the response file.
Automatic Management of Kerberos Details
VectorH includes utilities for automatically managing the Kerberos details. These include syncing the keytab files to all nodes in the cluster and renewing Kerberos tickets when required. These utilities are running by default; however, if required, you can disable the utilities and manage the Kerberos details manually. For details see the Kerberos section in the Getting Started guide.
Resetting the Installation
If at any point you need to remove the installed components, perform the following steps:
1. Log in as the installation owner (default is actian) and source the installation environment script. (The default is ~/.ingVHsh; however, if the installation did not complete successfully this may not be present—a copy may be available beneath the installation directory.)
2. Stop the installation (ingstop followed by ingstop –mgmtsvr).
3. Remove the HDFS location (/Actian/VectorVH).
4. On all nodes, remove the Linux installation directory (the default is /opt/Actian/VectorVH).
5. On all nodes, remove the installation owner Linux account.
Verifying the Installation
The instance should now be up and running on the master node (nothing is yet running on the slave nodes). Verify this is the case with the ingstatus command. Here (and for the rest of the document) the examples use an installation with default settings unless stated otherwise.
# Log in as the installation user, execute the environment script, then ingstatus
su - actian
. ./.ingVHsh
ingstatus
Actian Vector H VH name server (iigcn) - running
Actian Vector H VH recovery server (dmfrcp) - running
Actian Vector H VH DBMS server (iidbms) - 1 running
Actian Vector H VH Actian Vector H server (iix100) - not active
Actian Vector H VH Net server (iigcc) - 1 running
Actian Vector H VH Data Access server (iigcd) - 1 running
Actian Vector H VH RMCMD process (rmcmd) - running
Actian Vector H VH Management server (mgmtsvr) - running
Actian Vector H VH archiver process (dmfacp) - running
If the instance is not running, start it using the ingstart command.
Notice that there is no execution engine (x100) running; the execution engine is started upon first connection to a database. One x100 process is started for each database connected to.
To verify the installation, check that cluster connectivity is working, and that a database can be created and started. A script,
RemoteExec.sh, is provided for the first step (see
Test Scripts).
# Log in as the installation user and execute the environment variable settings
su- actian
. ./.ingVHsh
./RemoteExec.sh
For each host, the output should be something similar to:
[Hostname]
Found 8 items
drwxr-xr-x - actian hdfs 0 2015-06-15 20:08 /Actian
drwxrwxrwx - yarn hadoop 0 2015-06-13 18:24 /app-logs
drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:17 /hdp
drwxr-xr-x - mapred hdfs 0 2015-06-13 18:16 /mapred
drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:16 /mr-history
drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:19 /system
drwxrwxrwx - hdfs hdfs 0 2015-06-13 18:20 /tmp
drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:21 /user
Lastly, create a database and connect to it with the terminal monitor (sql):
su - actian
. ./.ingVHsh
createdb pocdb
[actian@VectorH-HW1 ~]$ sql pocdb
TERMINAL MONITOR Copyright 2018 Actian Corporation
Vector in Hadoop Linux Version VH 5.1.0 (a64.lnx/165) login
Thu Oct 4 09:48:47 2018
Enter \g to execute commands, "help help\g" for general help,
"help tm\g" for terminal monitor help, \q to quit
continue
* create table test(col1 int);
* insert into test values (1);
* select * from test;
* \g
Executing . . .
(1 row)
┌─────────────┐
│col1 │
├─────────────┤
│ 1│
└─────────────┘
(1 row)
continue
* drop table test;
* \g
Executing . . .
continue
* \q
It can be useful to run top –u actian on each node to observe the database startup. On the master node top shows something similar to:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
65226 actian 20 0 2957m 68m 9m S 0.7 1.1 0:39.93 mgmtsvr
45514 actian 20 0 3671m 148m 18m S 0.3 2.5 0:05.24 x100_server
47972 actian 20 0 842m 106m 9832 S 0.3 1.8 0:32.77 iidbms
45445 actian 20 0 11488 1384 1052 S 0.0 0.0 0:00.00 mpirun
45507 actian 20 0 18076 1532 1168 S 0.0 0.0 0:00.01 mpiexec.hydra
45508 actian 20 0 15972 1328 1036 S 0.0 0.0 0:00.00 pmi_proxy
47688 actian 20 0 27384 3096 1564 S 0.0 0.1 0:00.04 iigcn
47873 actian 20 0 444m 42m 33m S 0.0 0.7 0:02.00 iidbms
47949 actian 20 0 75904 4760 2420 S 0.0 0.1 0:00.05 dmfacp
48011 actian 20 0 26844 2036 1032 S 0.0 0.0 0:00.00 iigcc
48038 actian 20 0 126m 3176 1184 S 0.0 0.1 0:00.02 iigcd
48059 actian 20 0 137m 10m 2492 S 0.0 0.2 0:06.90 rmcmd
On the slave nodes top shows something similar to:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25359 actian 20 0 98368 1812 836 S 0.0 0.0 0:00.00 sshd
25360 actian 20 0 17928 1408 1124 S 0.0 0.0 0:00.01 pmi_proxy
25376 actian 20 0 3752m 143m 18m S 0.0 2.4 0:05.65 x100_server
This guide assumes that the pocdb database exists and is running unless stated otherwise.