Integrating DataFlow with Apache Hive

Installing and Configuring DataFlow : Installing DataFlow as a Plugin : Installing and Configuring DataFlow on a YARN-enabled Hadoop Cluster : Integrating DataFlow with Apache Hive

Share this page

New features in Apache Hive 2.0.0 are detailed at https://bigdata-madesimple.com/10-new-exciting-features-in-apache-hive-2-0-0/. Hive installation instructions and DML operation are documented at https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration. Note that the Hive version 2.X.X release parquet package has been moved inside the org.apache package.

The following cluster types support these Hive versions:

Cluster	Hive Version Support
Apache Hadoop 3	3.1.2
CDH 5.15.1 Hadoop 2.6.0	1.1.0
HDP 2.6.5 Hadoop 2.7.3	1.2.1 2.1.0
MapR 5.2	2.1.1
MapR 6.0	1.2.1 2.3.1

HiveServer2 support was introduced in the 0.11.0 release. HiveServer2 supports the Rest API. You can use tools such as Beeline to connect from any client outside of your cluster to the Hive database. In a secured environment, Beeline (the Hive Rest API client) will connect through a Knox gateway. HiveServer2 provides more security and multiple Beeline connections. Advantages of HiveServer2 over HiveServer1 include:

• HiveServer2 Thrift API specification

• JDBC/ODBC HiveServer2 drivers

• Concurrent Thrift clients with memory leak fixes and session/configuration information

• Kerberos authentication

• Authorization to improve GRANT/ROLE and code injection vectors

To add new Hive driver in KNIME to connect to Hive DB

1. Open KNIME.

2. Click File, Preferences, KNIME, Databases, New.

3. Add the required jar (JDBC Connector Jar).

4. Click OK.

To Test JDBC connection from KNIME using the cdata.jdbc.hive.HiveDriver

Follow the instructions at https://www.cdata.com/kb/tech/hive-jdbc-knime.rst.

To test the database connection

1. Create a new workflow.

2. Add a Database connector node and right-click on it to configure it.

3. Select Driver as cdata.jdbc.hive.HiveDriver.

4. Enter the Database URL:

jdbc:hive:Server=<host_url>;Port=10000;TransportMode=BINARY

5. Enter the username and password.

6. Click OK and execute the workflow to check the connection.