VH 6.0 | Hadoop Requirement

Getting Started > Getting Started > Installation Considerations and Requirements > Hadoop Requirement

Was this helpful?

Hadoop Requirement

You must install and configure Hadoop before installing VectorH.

IMPORTANT! The Hadoop NameNode and DataNode must be separate, according to Hadoop best practices. VectorH works only when the NameNode works. If the NameNode doubles as a DataNode, and the DataNode runs out of disk space because of a large amount of data, the cluster will stop working.

For information about supported Hadoop distributions, see the readme.

Recommended Hadoop Settings

We recommend the following Hadoop settings:

• dfs.datanode.max.transfer.threads: 4096 or higher. Follow the Hadoop vendor recommendations, if higher.

• dfs.replication: Less than the number of VectorH nodes. As of 4.2.2, the [cbm] hdfs_replication configuration setting can be used instead.

If you want VectorH to integrate with YARN:

• ipc.client.connect.max.retries: 3

• ipc.client.max.retires.on.timeouts: 3

• yarn.nm.liveness-monitor.expiry-interval-ms: 10000

• yarn.client.nodemanager-connect.max-wait-ms: 50000

• yarn.client.nodemanager-connect.retry-interval-ms: 10000

• yarn.resourcemanager.system-metrics-publisher.enabled: false

• yarn.am.liveness-monitor.expiry-interval-ms: 10000

• yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator

• If the yarn-site.xml file contains the property “yarn.nodemanager.remote-app-log-dir: hdfs://var/...”, you must add the NameNode into the hdfs URI:

yarn.nodemanager.remote-app-log-dir: hdfs://your_name_node/var/...

• Add the following to the yarn-site.xml file if it is missing:

yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

For more on YARN integration, see Enable YARN Integration on page 51.

Last modified date: 01/26/2023