Hadoop Requirement
You must install and configure Hadoop before installing VectorH.
IMPORTANT! The Hadoop NameNode and DataNode must be separate, according to Hadoop best practices. VectorH works only when the NameNode works. If the NameNode doubles as a DataNode, and the DataNode runs out of disk space because of a large amount of data, the cluster will stop working.
For more information about supported Hadoop distributions, see the readme.
Recommended Hadoop Settings
We recommend the following Hadoop settings:
• dfs.datanode.max.transfer.threads: 4096 or higher. Follow the Hadoop vendor recommendations, if higher.
• dfs.replication: Less than the number of VectorH nodes. As of 4.2.2, the [cbm] hdfs_replication configuration setting can be used instead.
If you want VectorH to integrate with YARN:
• ipc.client.connect.max.retries: 3
• ipc.client.max.retires.on.timeouts: 3
• yarn.nm.liveness-monitor.expiry-interval-ms: 10000
• yarn.client.nodemanager-connect.max-wait-ms: 50000
• yarn.client.nodemanager-connect.retry-interval-ms: 10000
• yarn.resourcemanager.system-metrics-publisher.enabled: false
• yarn.am.liveness-monitor.expiry-interval-ms: 10000
• yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
• If the yarn-site.xml file contains the property “yarn.nodemanager.remote-app-log-dir: hdfs://var/...”, you must add the NameNode into the hdfs URI:
yarn.nodemanager.remote-app-log-dir: hdfs://your_name_node/var/...
• Add the following to the yarn-site.xml file if it is missing:
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
For more on YARN integration, see
Enable YARN Integration.
MapR Requirements
The MapR Hadoop distribution by default allocates relatively large amounts of memory to its MFS service. By default, VectorH uses 75% of physical memory, leaving no memory space for anything else. This configuration can lead to failures caused by Out Of Memory (OOM) errors and processes being killed by the Linux OOM checker. To avoid this, service.command.mfs.heapsize.maxpercent in warden.conf should be reduced to 15% or less.