Client Installation

1. Install and configure KNIME with DataFlow extensions. See Installing DataFlow on KNIME.

c. The About KNIME dialog is displayed with information about the KNIME instance. Click Installation Details at the bottom, left side of the dialog.

e. In the list of installed software packages dialog, select the package 'KNIME Public Server Access' and click Uninstall at the bottom of the dialog.

f. Follow the instructions provided by the Uninstall wizard to uninstall 'KNIME Public Server Access' software.

a. Configure DataFlow to use MapR. For configuration information, see Integrating DataFlow with a Hadoop Cluster.

b. Create a DataFlow Execution Profile that will reference the MapR cluster. For information about creating the profile, see Integrating DataFlow with a Hadoop Cluster.

a. Install and configure the MapR client software. For installation instructions, visit the website MapR client installation.

Note: Ensure that you follow the instructions to configure client access to the remote MapR cluster.

b. Verify that the UserID (UID) and GroupID (GID) is the same as the head node of the cluster. If not, then do the following:

1) Log in to the head node of the cluster as the mapr user and run the command id to get UID and GID.

2) Log in to the client as root user and run the following commands to set the UID and GID:

3) Set the yarn.resourcemanager.hostname and yarn.application.classpath properties on the MapR client. Enter a value for resourcemanager.hostname and application.classpath properties in /opt/mapr/hadoop/<hadoop version>/etc/hadoop/yarn-site.xml file.

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>resourcemanager host name address</value>
    <description>host is the host name of the resource manager</description>
</property>
<property>
    <name>yarn.application.classpath</name>
    <value></value>
</property>

c. If the properties are unavailable or the values not set, then to get the values for the properties:

• application.classpath – Run the command yarn classpath at the client’s command prompt.

• resourcemanager.hostname – Run the command maprcli node list -columns hostname,csvc on the head node of the cluster.

Note: DataFlow on a MapR cluster can support only HBase version 0.94. To support HBase with MapR, manually add the HBase classpaths to the yarn.application.classpath and mapreduce.application.classpath properties. For example:

d. Set up the HADOOP_HOME environment variables before you run the RushScript or start KNIME to run the workflow. For example on Linux, export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.4.1.

Note: If the client is running on Windows, then set DR_HADOOP_CLASSPATH = %DR_HOME%/lib/hadoop.

In the Hadoop cluster, do not set the DR_HADOOP_CLASSPATH environment variable. If the value is set in the cluster, then the job might fail with the java.lang.LinkageError. If you receive the error, then run yarn classpath command in the cluster and set the DR_HADOOP_CLASSPATH variable with the result and run the job.

After installing and configuring the MapR client, test the connection to the cluster. To do this, run the following code:

$ /opt/mapr/hadoop/hadoop-2.4.1/bin/hdfs dfs -ls /
Found 5 items
drwxr-xr-x   - mapr mapr          1 2014-10-16 12:13 /apps
drwxr-xr-x   - mapr mapr          7 2014-10-22 14:39 /hbase
drwxrwxrwx   - mapr mapr          1 2014-10-17 14:55 /tmp
drwxr-xr-x   - mapr mapr          1 2014-09-09 13:08 /user
drwxr-xr-x   - mapr mapr          1 2014-09-09 13:07 /var

Note: The path to the HDFS executable varies based on the installed location of the MapR client and the version of MapR.

The output of the command varies based on the current contents of HDFS. If the command runs successfully, then the MapR client is successfully installed and configured.