VH 6.0 | Installation

Evaluation Guide > Evaluation Guide > Stage 1--Define and Create the Test Environment > Installation

Was this helpful?

Installation

A functioning VectorH environment requires the following:

• A Linux account that will be the user who owns the installation that has passwordless SSH access between the cluster nodes

• Linux file system space owned by the user who owns the installation for software and configuration files

• A functioning and supported Hadoop environment

• A location in HDFS owned by the installation owner for data

• A high speed interconnect between data nodes. 10Gb Ethernet is recommended; 1Gb Ethernet will work but performance is likely to be impaired.

If the installation process can be done with privileged access (root or sudo, for example), then the installer will automatically perform all the required setup. If the installation process is to be done without privileged access, then the steps requiring administration operations must be done first: creation of user accounts, setup of passwordless SSH access, creation and ownership setting of Linux file system directories, creation and ownership setting of the HDFS location.

The installer creates a VectorH “instance” that is designated by an instance ID. The default instance ID for the first instance of VectorH on a cluster is VH. The other installation defaults are: actian as the user who owns the installation, /opt/Actian/VectorVH as the Linux file system location, and /Actian/VectorVH as the HDFS location.

This section describes common installation scenarios. For full installation instructions, see the Getting Started guide.

Recommended Hadoop Settings

We recommend the following Hadoop settings:

• dfs.datanode.max.transfer.threads: 4096 or higher. Follow the Hadoop vendor recommendations, if higher.

• dfs.replication: Less than the number of VectorH nodes. As of VectorH 4.2.2, the [cbm] hdfs_replication configuration setting can be used instead.

If you want VectorH to integrate with YARN:

• ipc.client.connect.max.retries: 3

• ipc.client.max.retires.on.timeouts: 3

• yarn.nm.liveness-monitor.expiry-interval-ms: 10000

• yarn.client.nodemanager-connect-max-wait-ms: 50000

• yarn.client.nodemanager-connect-retry-interval-ms: 10000

• yarn.resourcemanager.system-metrics-publisher.enabled: false

• yarn.am.liveness-monitor.expiry-interval-ms: 10000

• yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

• If the yarn-site.xml file contains the property “yarn.nodemanager.remote-app-log-dir: hdfs://var/...”, you must add the NameNode into the hdfs URI:

yarn.nodemanager.remote-app-log-dir: hdfs://your_name_node/var/...

• Add the following to the yarn-site.xml file if it is missing:

yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

Pre-installation Tasks

Prior to installation, check that the following prerequisites are met for all nodes:

• Supported version of Linux

• Installed Linux libraries for OpenSSH, rsync, and libaio

• Supported and installed version of Hadoop and Java

These requirements are documented in more detail in the Getting Started guide, the readme file on ESD (http://esd.actian.com), and the Product Availability Matrix (http://downloads.actian.com/media/pdfs/product_avail_matrix_vector.pdf).

Installing with root Access

Prerequisites: Software has been extracted. Evaluation key is placed in a file named authstring in the directory where the installation software has been extracted to.

Default Installation: Installation owner (user) is actian, instance ID is VH, Linux directory is /opt/Actian/VectorVH/, HDFS location is /Actian/VectorVH.

Issue the following command:

./install.sh

Install as Specific User: Install as the user actian2.

./install.sh -user actian2

Install in a Specific Directory: Install with user actian3 with software placed in the home directory.

./install.sh -user actian3 me/actian3/VectorVH

Install with a Specific Instance ID: Install with user actian3 with software placed in the home directory using V2 as the instance ID.

./install.sh -user actian3 /home/actian3/VectorV2 V2

Which DataNodes to Install VectorH On?

By default, VectorH is installed on all the active DataNodes in a Hadoop cluster, and their Fully Qualified Domain Names (FQDN) are automatically recorded in the slaves file in $II_SYSTEM/ingres/files/hdfs.

If you do not want to install VectorH on all the active DataNodes, follow this procedure:

To control the number of DataNodes used by VectorH

1. When you are prompted during the installation whether you want to set up the DataNodes, enter “n” for no.

The install process stops.

2. Edit the slaves file in $II_SYSTEM/ingres/files/hdfs to list the DataNodes you want to use.

3. Continue the installation by using the following command:

iisuhdfs datanodes

The slave nodes are set up and the installation is complete.

Installing with sudo Access

Installing with the ‑usesudo flag allows the installation to be carried out by a user other than root and still have the install script perform all the required tasks. Providing the user has the ability to use sudo on all nodes, then the installation process can be carried out in the same way as when using root.

Pre-requisites: As per root based installs with the additional requirements that the user performing the install exists on all nodes, has a password, and can use sudo.

Example sudo Installation: Use defaults of instance ID VH, user actian, installation directory /opt/Actian/VectorVH, HDFS location /Actian/VectorVH.

Issue the following command:

./install.sh -usesudo

Installing Without Privileged Access

If the installation must be performed with an unprivileged account, then the steps that require administrative operations must be done before the installation process starts.

Pre-requisites: As per root based installs with the additional requirements: the installation user exists on all nodes, passwordless SSH access between the master node and the slave nodes is already setup, the Linux and HDFS directories exist and are owned by the installation owner.

Example Noroot/Nosudo Installation: Use defaults of installation VH, user actian, installation directory /opt/Actian/VectorVH, HDFS location /Actian/VectorVH.

# Create user (repeat on all nodes):

useradd actian

passwd actian

# On master nodesetup passwordless ssh:

su - actian

ssh-keygen

ssh-copy-id actian@[Fully qualified hostname] # Repeat for all hosts

exit

# Setup HDFS location:

su hdfs

hdfs dfs -mkdir /Actian

hdfs dfs -mkdir /Actian/VectorVH

hdfs dfs -chown actian /Actian

hdfs dfs -chown actian /Actian/VectorVH

hdfs dfs -chmod 755 /Actian

hdfs dfs -chmod 755 /Actian/VectorVH

exit

# On all nodes setup Linux file system directories:

mkdir -p /opt/Actian/VectorVH

chown actian /opt/Actian/VectorVH

# Run install

su - actian

cd [Directory where install.sh is located]

./install.sh -noroot

Because the Hadoop slaves information may not be visible to the user performing the installation, it may be necessary to complete the slave node installation as a separate step. If the installer does not prompt for information about the slave nodes, then also execute these commands:

# Setup the required environment variables and stop the current instance

. ~/.ingVHsh

ingstop

# Edit the slave nodes list and add the required nodes:

vi $II_SYSTEM/ingres/files/hdfs/slaves

# Perform setup of slave nodes and restart the instance

iisuhdfs datanodes

ingstart

Note: The iisuhdfs datanodes command can also be used to make modifications to an existing installation. Running the command more than once does not harm the installation; the command, however, will reset the installation parallelism settings based on the number of nodes active in the system. If you have made changes to the default values (as described in Configuring Database Resources), then the configuration file may need to be modified after running iisuhdfs datanodes.

Installing Using a Response File

The installation script supports supplying parameters through a response file. This can be used to make installation easily repeatable or even completely automated.

The most minimal response file requires only the time zone. You can also list slave nodes to install VectorH on. For example:

II_TIMEZONE_NAME=GMT

SLAVES_LIST=VectorH-HW1.localdomain,VectorH-HW2.localdomain

The installation can then be performed by supplying this file to the installer for example:

install.sh -respfile respfile.txt -acceptlicense -noroot

For a complete list of response file parameters, see the Getting Started guide.

How to Install on a Kerberos-enabled Cluster

If Kerberos is enabled in the Hadoop cluster, then a Kerberos principal and a keytab file are required to perform the installation. You will need a renewable Kerberos ticket for the actian user and hdfs user.

If doing a privileged install (using root or sudo), then the installer becomes the hdfs user as part of the installation process and so the hdfs user also requires a Kerberos ticket.

Follow this process when installing on a Kerberos-enabled cluster:

1. Set up the Kerberos principal for the Actian software. For example:

actian@YOURREALM

2. Extract the keytab file for Actian.

3. Obtain a ticket for the actian user. For example:

kinit actian@YOURREALM –k –t /.../actian.headless.keytab.

4. If needed, also ensure that the hdfs user has a Kerberos ticket using a command similar to above as the hdfs user. For example:

sudo -u hdfs kinit hdfs...

5. Start the installer.

The installer will detect that the cluster has Kerberos enabled, check that required tickets are in place, and prompt for the required Kerberos details (principal name and keytab file).

If using a response file to perform the install, then add the required parameters (KRB5_PRINCIPAL and KRB5_KEYTAB_FILE) to the response file.

Automatic Management of Kerberos Details

VectorH includes utilities for automatically managing the Kerberos details. These include syncing the keytab files to all nodes in the cluster and renewing Kerberos tickets when required. These utilities are running by default; however, if required, you can disable the utilities and manage the Kerberos details manually. For details see the Kerberos section in the Getting Started guide.

Resetting the Installation

If at any point you need to remove the installed components, perform the following steps:

1. Log in as the installation owner (default is actian) and source the installation environment script. (The default is ~/.ingVHsh; however, if the installation did not complete successfully this may not be present—a copy may be available beneath the installation directory.)

2. Stop the installation (ingstop followed by ingstop –mgmtsvr).

3. Remove the HDFS location (/Actian/VectorVH).

4. On all nodes, remove the Linux installation directory (the default is /opt/Actian/VectorVH).

5. On all nodes, remove the installation owner Linux account.

Verifying the Installation

The instance should now be up and running on the master node (nothing is yet running on the slave nodes). Verify this is the case with the ingstatus command. Here (and for the rest of the document) the examples use an installation with default settings unless stated otherwise.

# Log in as the installation user, execute the environment script, then ingstatus

su - actian

. ./.ingVHsh

ingstatus

Actian Vector H VH name server (iigcn) - running

Actian Vector H VH recovery server (dmfrcp) - running

Actian Vector H VH DBMS server (iidbms) - 1 running

Actian Vector H VH Actian Vector H server (iix100) - not active

Actian Vector H VH Net server (iigcc) - 1 running

Actian Vector H VH Data Access server (iigcd) - 1 running

Actian Vector H VH RMCMD process (rmcmd) - running

Actian Vector H VH Management server (mgmtsvr) - running

Actian Vector H VH archiver process (dmfacp) - running

If the instance is not running, start it using the ingstart command.

Notice that there is no execution engine (x100) running; the execution engine is started upon first connection to a database. One x100 process is started for each database connected to.

To verify the installation, check that cluster connectivity is working, and that a database can be created and started. A script, RemoteExec.sh, is provided for the first step (see Test Scripts on page 1).

# Log in as the installation user and execute the environment variable settings

su- actian

. ./.ingVHsh

./RemoteExec.sh

For each host, the output should be something similar to:

[Hostname]

Found 8 items

drwxr-xr-x - actian hdfs 0 2015-06-15 20:08 /Actian

drwxrwxrwx - yarn hadoop 0 2015-06-13 18:24 /app-logs

drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:17 /hdp

drwxr-xr-x - mapred hdfs 0 2015-06-13 18:16 /mapred

drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:16 /mr-history

drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:19 /system

drwxrwxrwx - hdfs hdfs 0 2015-06-13 18:20 /tmp

drwxr-xr-x - hdfs hdfs 0 2015-06-13 18:21 /user

Lastly, create a database and connect to it with the terminal monitor (sql):

su - actian

. ./.ingVHsh

createdb pocdb

[actian@VectorH-HW1 ~]$ sql pocdb

Vector in Hadoop Linux Version VH 6.0.0 (a64.lnx/165) login

Thu Mar 12 09:48:47 2020

Enter \g to execute commands, "help help\g" for general help,

"help tm\g" for terminal monitor help, \q to quit

continue

* create table test(col1 int);

* insert into test values (1);

* select * from test;

* \g

Executing . . .

(1 row)

┌─────────────┐

│col1 │

├─────────────┤

│ 1│

└─────────────┘

(1 row)

continue

* drop table test;

* \g

Executing . . .

continue

* \q

It can be useful to run top –u actian on each node to observe the database startup. On the master node top shows something similar to:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

65226 actian 20 0 2957m 68m 9m S 0.7 1.1 0:39.93 mgmtsvr

45514 actian 20 0 3671m 148m 18m S 0.3 2.5 0:05.24 x100_server

47972 actian 20 0 842m 106m 9832 S 0.3 1.8 0:32.77 iidbms

45445 actian 20 0 11488 1384 1052 S 0.0 0.0 0:00.00 mpirun

45507 actian 20 0 18076 1532 1168 S 0.0 0.0 0:00.01 mpiexec.hydra

45508 actian 20 0 15972 1328 1036 S 0.0 0.0 0:00.00 pmi_proxy

47688 actian 20 0 27384 3096 1564 S 0.0 0.1 0:00.04 iigcn

47873 actian 20 0 444m 42m 33m S 0.0 0.7 0:02.00 iidbms

47949 actian 20 0 75904 4760 2420 S 0.0 0.1 0:00.05 dmfacp

48011 actian 20 0 26844 2036 1032 S 0.0 0.0 0:00.00 iigcc

48038 actian 20 0 126m 3176 1184 S 0.0 0.1 0:00.02 iigcd

48059 actian 20 0 137m 10m 2492 S 0.0 0.2 0:06.90 rmcmd

On the slave nodes top shows something similar to:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

25359 actian 20 0 98368 1812 836 S 0.0 0.0 0:00.00 sshd

25360 actian 20 0 17928 1408 1124 S 0.0 0.0 0:00.01 pmi_proxy

25376 actian 20 0 3752m 143m 18m S 0.0 2.4 0:05.65 x100_server

This guide assumes that the pocdb database exists and is running unless stated otherwise.

Last modified date: 01/26/2023