Post Installation Tasks
Before the installation can be considered complete, the following tasks should be performed.
Linux Configuration Settings
Certain Linux configuration settings should be set on each node to run the installation.
We recommend the following settings, which should be performed on each node.
• (Optional) Check that the Linux OS allows overcommitting of virtual memory:
cat /proc/sys/vm/overcommit_memory # Should return 1
ulimit -v # Should return unlimited
• Check the number of open files set by the installer:
ulimit -n
This number should be large, for example, 30,000.
Configuring Firewall Ports
To gain access to the installation from a remote client such as JDBC or Actian Director through a firewall, four ports will require opening.
The discovery port is fixed at 16902 and can be queried using a browser (for example, http://masterhost:16902/) or at the command line (iimgmtsvr list) to find out the command port.
The GCC server and DAS server ports can also be found from the messages in the error log:
grep TCP_IP $II_SYSTEM/ingres/files/errlog.log | sed -e 's/^.*IIGC/IIGC/g' -e 's/,.*, port/ port/g'| sort -u
IIGCC port VH (27712).
IIGCD port VH7 (27719).
Further information on the GCC and GCD ports can be found in the
Connectivity Guide and the community document
Finding-and-Managing-Ingres-and-Vector-TCP-IP-Ports.
TCP/IP Routing for Edge Nodes
If there is no access to the master node and firewall ports cannot be opened, but there exists a “gateway host” such as a Hadoop edge node, then you may consider setting up TCP/IP tunneling on the gateway host to provide access to the master node through the gateway node. This can be setup using the Linux built in firewall or through TCP/IP proxy software such as Balance.
Alternatively, if you want users to log into the edge node and then run client software from there to access the rest of the cluster, then you should install the Vector Client Runtime on this edge node, and create a virtual node (vnode) to connect to the VectorH master node from here, with appropriate IP address and user credentials. After that is done, a user can log into the edge node and connect to VectorH by using the double colon remote vnode syntax like this:
sql <vnode name>::<database name>
Configuring to Run as a Service
VectorH can be set up to run as a service. This may be useful if access to the DataNodes is restricted. The example below uses root to set up the service.
# Generate the service scripts
su - actian
. ~/ingVHsh
mkrc
# Install the service script
su - root
cd /home/actian
. ./.ingVHsh
mkrc ‑i
# Check the service
service actian-vectorhVH status
#Optionally add actian to sudoers file so that the user can control this service:
visudo
[ # Add a line similar to this:
actian ALL=NOPASSWD: /sbin/service actian-vectorhVH *
]
exit
sudo service actian-vectorhVH status
The instance can also be started remotely by Actian Director providing that the management server is running. It is possible to configure just the management server to run as a service, and then use Actian Director to start and stop the rest of the instance. For more information see the mkrc command in the User Guide.
Configuring Database Resources
VectorH has relatively few configuration parameters that require setting; the following parameters, however, should be modified to match the characteristics of the system:
[cbm] bufferpool_size
Specifies the amount of memory to be used for cached data blocks. The default is 25% of available system memory.
[memory] max_memory_size
Specifies the amount of memory to be used as work memory for queries. The default is 50% of available system memory.
[system] num_cores
Specifies the number of physical cores on the host (that is, not including hyper-threading). This value is used by the optimizer.
[engine] max_parallelism_level
Defines the maximum attempted parallelism for queries. 50% of the total number of physical cores that the Actian software is able to access is a good starting point. For example, in a 10 node Hadoop cluster with 16 cores per node and the Actian software running on 5 DataNodes, set this to 5*16/2=40.
[cbm] max_open_files
Defines the number of files that the server can open at any one time. Set this in accordance with the OS max open files (see
Linux Configuration Settings).
[engine] sort_intmemory
Specifies the memory available for internal sort phase. Default is 256 MB. Increase to 2% of max_memory_size (1% of physical RAM in a default configuration) if larger. For example, 5 GB on a 512 GB host.
Note: If you have a high number of partitions—for example if tuning for low concurrency and have set the number of partitions equal to the number of cores in the cluster—then you may need to set a lower value to avoid out of memory errors.
These settings are stored in the system-level vectorwise.conf file. If you are considering multiple databases, these parameters can be tuned per database by having individual database-level configuration files.
After the configuration file has been changed it must be synced with the slave nodes. This occurs automatically as part of the start-up process. Otherwise, you could issue the iisuhdfs sync_config command, as in the following example:
su - actian
. ~/.ingVHsh
vi $II_SYSTEM/ingres/data/vectorwise/vectorwise.conf
[ Perform required edits - max_memory_size, num_cores, bufferpool_size, max_parallelism_level, max_open_files ]
# Sync the new settings with the slave nodes and restart
ingstop
iisuhdfs sync_config
Syncing config...
ingstart
Giving the Database User a Password
To connect using client tools, the user must have a database password (unless using an external mechanism such as Kerberos). Setting an initial password for the installation owner is done as part of the installation.
If not done as part of the installation or if you want to set up additional users, the password can be set up by connecting to iidbdb and issuing SQL commands (as below), or by using the accessdb utility. If you are new to Actian database software, it may be easier to use the accessdb utility.
The iidbdb database is the VectorH master database (the database of databases) and holds metadata for the installation. You can create other users by connecting to this database.
# Connect as the installation owner, set the environment variables, and connect to iidbdb.
su - actian
. ./.ingVHsh
sql iidbdb
TERMINAL MONITOR Copyright 2018 Actian Corporation
Vector in Hadoop Linux Version VH 5.1.0 (a64.lnx/165) login
Thu Oct 4 10:15:02 2018
Enter \g to execute commands, "help help\g" for general help,
"help tm\g" for terminal monitor help, \q to quit
continue
* alter user actian with password = 'actian' \g
Executing . . .
continue
* \q
Your SQL statement(s) have been committed.
Vector in Hadoop Version VH 5.1.0 (a64.lnx/165) logout
Thu Oct 4 10:17:05 2018
[actian@VectorH-HW1 ~]$ sql -Uactian -Pactian pocdb
TERMINAL MONITOR Copyright 2018 Actian Corporation
Vector in Hadoop Linux Version VH 5.1.0 (a64.lnx/165) login
Thu Oct 4 10:19:13 2018
Enter \g to execute commands, "help help\g" for general help,
"help tm\g" for terminal monitor help, \q to quit
continue
* \q
Add Environment Setup to the Login Script
You may want to add the environment setup script to the Linux login profile—this removes the need to execute .ingVHsh after logging in. The remainder of the document assumes this has been done.
Note: If you have installed multiple instances of VectorH on the same cluster, you will need to be able to switch between different instances (unless each is owned by a different user). In such a case, you may want to develop a simple menu system to allow you to choose the required environment when logging in.
Using the Correct Ethernet Connection
Currently the software selects the first Ethernet interface available as the one to use for connections between the nodes. In the case where there are multiple interfaces available, the first may not be the correct one. To select the appropriate interface, edit the communication setup script as follows:
# Save the original script.
cp $II_SYSTEM/ingres/bin/x100_mpi_run $II_SYSTEM/ingres/bin/x100_mpi_run.original
vi $II_SYSTEM/ingres/bin/x100_mpi_run
[ Edit line 174 to add the interface specification ]
[ Changempiargs="${mpiargs} -errfile-pattern=$II_SYSTEM/ingres/files/mpierr.log" ]
[ To mpiargs="${mpiargs} –iface=XXX-errfile-pattern=$II_SYSTEM/ingres/files/mpierr.log" ]
[ Where XXX is the desired interface e.g. eth1 ]
# Restart the cluster
ingstop
ingstart
As of Vector H 4.2.3 the following config.dat parameter can be set to the required interface instead of editing the x100_mpi_run script directly:
ii.$config_host.x100.mpi.iface: eth1_1
If it is not clear which is the correct interface to use, the Linux commands ifconfig and ethtool can be used to determine which interface is in use and the capabilities of the interface (bonded configurations can be observed through /proc/net/bonding).
Subsequent versions of the installer will allow the interface selection to be specified through configuration.
Enabling Short Circuit Reads
Because of the partitioned and co-located nature of VectorH, the software is able to utilize HDFS short-circuit reads to good effect. An example for enabling short circuit reads can be found here: HDFS Short-Circuit Local Reads (
http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html).
For most distributions this is enabled by default. For instructions on enabling in your environment, consult the Hadoop distribution documentation. Typically, you need to enable libhadoop.so and include property entries in your Hadoop hdfs-site.xml configuration that look similar to this:
<configuration>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
</configuration>
Enabling this feature in Hadoop can improve bulk data loads rates into VectorH by up to 20 percent.
Running Under YARN
VectorH can be set up to run under YARN. The default installation method is to run outside of YARN. For instructions on enabling YARN support, see the Getting Started guide.
Controlling Checkpoint Disk Space
Because “big data” databases can be large, it may be prudent to set the number of checkpoints maintained by the system. This can be done with the alterdb command utility.
For example, the following command configures pocdb to maintain one checkpoint only:
alterdb -keep=1 pocdb