Client Installation
To use DataFlow extensions to KNIME with MapR
Perform the following steps on the client:
2. Uninstall the KNIME Public Server Access software. To uninstall:
a. Start KNIME.
b. From the Help menu, select About KNIME.
c. The About KNIME dialog is displayed with information about the KNIME instance. Click Installation Details at the bottom, left side of the dialog.
d. In the Select An Installation Details dialog, go to the Installed Software tab.
e. In the list of installed software packages dialog, select the package 'KNIME Public Server Access' and click Uninstall at the bottom of the dialog.
f. Follow the instructions provided by the Uninstall wizard to uninstall 'KNIME Public Server Access' software.
g. At the prompt, select the option to restart KNIME.
After restarting, KNIME is ready to access the MapR cluster.
3. Set up DataFlow in KNIME to access MapR cluster. To do this:
4. Install and configure the MapR client. To do this:
a. Install and configure the MapR client software. For installation instructions, visit the website
MapR client installation.
Note: Ensure that you follow the instructions to configure client access to the remote MapR cluster.
b. Verify that the UserID (UID) and GroupID (GID) is the same as the head node of the cluster. If not, then do the following:
1) Log in to the head node of the cluster as the mapr user and run the command id to get UID and GID.
2) Log in to the client as root user and run the following commands to set the UID and GID:
usermod -g <UID> <user name>
groupmod -g <GID> <group name>
3) Set the yarn.resourcemanager.hostname and yarn.application.classpath properties on the MapR client. Enter a value for resourcemanager.hostname and application.classpath properties in /opt/mapr/hadoop/<hadoop version>/etc/hadoop/yarn-site.xml file.
<property>
<name>yarn.resourcemanager.hostname</name>
<value>resourcemanager host name address</value>
<description>host is the host name of the resource manager</description>
</property>
<property>
<name>yarn.application.classpath</name>
<value></value>
</property>
c. If the properties are unavailable or the values not set, then to get the values for the properties:
• application.classpath – Run the command yarn classpath at the client’s command prompt.
• resourcemanager.hostname – Run the command maprcli node list -columns hostname,csvc on the head node of the cluster.
Note: DataFlow on a MapR cluster can support only HBase version 0.94. To support HBase with MapR, manually add the HBase classpaths to the yarn.application.classpath and mapreduce.application.classpath properties. For example:
/opt/mapr/hbase/hbase-0.94.24/*, /opt/mapr/hbase/hbase-0.94.24/lib/*.
d. Set up the HADOOP_HOME environment variables before you run the RushScript or start KNIME to run the workflow. For example on Linux, export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.4.1.
Note: If the client is running on Windows, then set DR_HADOOP_CLASSPATH = %DR_HOME%/lib/hadoop.
In the Hadoop cluster, do not set the DR_HADOOP_CLASSPATH environment variable. If the value is set in the cluster, then the job might fail with the java.lang.LinkageError. If you receive the error, then run yarn classpath command in the cluster and set the DR_HADOOP_CLASSPATH variable with the result and run the job.
Verifying MapR Client Setup
After installing and configuring the MapR client, test the connection to the cluster. To do this, run the following code:
Test client install
$ /opt/mapr/hadoop/hadoop-2.4.1/bin/hdfs dfs -ls /
Found 5 items
drwxr-xr-x - mapr mapr 1 2014-10-16 12:13 /apps
drwxr-xr-x - mapr mapr 7 2014-10-22 14:39 /hbase
drwxrwxrwx - mapr mapr 1 2014-10-17 14:55 /tmp
drwxr-xr-x - mapr mapr 1 2014-09-09 13:08 /user
drwxr-xr-x - mapr mapr 1 2014-09-09 13:07 /var
Note: The path to the HDFS executable varies based on the installed location of the MapR client and the version of MapR.
The output of the command varies based on the current contents of HDFS. If the command runs successfully, then the MapR client is successfully installed and configured.