Installing and Configuring DataFlow : Installing DataFlow as a Plugin : Installing and Configuring DataFlow on a YARN-enabled Hadoop Cluster : Integrating DataFlow with Apache Hive
 
Share this page                  
Integrating DataFlow with Apache Hive
New features in Apache Hive 2.0.0 are detailed at https://bigdata-madesimple.com/10-new-exciting-features-in-apache-hive-2-0-0/. Hive installation instructions and DML operation are documented at https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration. Note that the Hive version 2.X.X release parquet package has been moved inside the org.apache package.
The following cluster types support these Hive versions:
Cluster
Hive Version Support
Apache Hadoop 3
3.1.2
CDH 5.15.1 Hadoop 2.6.0
1.1.0
HDP 2.6.5 Hadoop 2.7.3
1.2.1
2.1.0
MapR 5.2
2.1.1
MapR 6.0
1.2.1
2.3.1
HiveServer2 support was introduced in the 0.11.0 release. HiveServer2 supports the Rest API. You can use tools such as Beeline to connect from any client outside of your cluster to the Hive database. In a secured environment, Beeline (the Hive Rest API client) will connect through a Knox gateway. HiveServer2 provides more security and multiple Beeline connections. Advantages of HiveServer2 over HiveServer1 include:
HiveServer2 Thrift API specification
JDBC/ODBC HiveServer2 drivers
Concurrent Thrift clients with memory leak fixes and session/configuration information
Kerberos authentication
Authorization to improve GRANT/ROLE and code injection vectors
To add new Hive driver in KNIME to connect to Hive DB
1. Open KNIME.
2. Click File, Preferences, KNIME, Databases, New.
3. Add the required jar (JDBC Connector Jar).
4. Click OK.
To Test JDBC connection from KNIME using the cdata.jdbc.hive.HiveDriver
Follow the instructions at https://www.cdata.com/kb/tech/hive-jdbc-knime.rst.
To test the database connection
1. Create a new workflow.
2. Add a Database connector node and right-click on it to configure it.
3. Select Driver as cdata.jdbc.hive.HiveDriver.
4. Enter the Database URL:
jdbc:hive:Server=<host_url>;Port=10000;TransportMode=BINARY
5. Enter the username and password.
6. Click OK and execute the workflow to check the connection.