Integrating DataFlow with Apache Hive
New features in Apache Hive 2.0.0 are detailed at
https://bigdata-madesimple.com/10-new-exciting-features-in-apache-hive-2-0-0/. Hive installation instructions and DML operation are documented at
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-InstallationandConfiguration. Note that the Hive version 2.
X.
X release parquet package has been moved inside the org.apache package.
The following cluster types support these Hive versions:
HiveServer2 support was introduced in the 0.11.0 release. HiveServer2 supports the Rest API. You can use tools such as Beeline to connect from any client outside of your cluster to the Hive database. In a secured environment, Beeline (the Hive Rest API client) will connect through a Knox gateway. HiveServer2 provides more security and multiple Beeline connections. Advantages of HiveServer2 over HiveServer1 include:
• HiveServer2 Thrift API specification
• JDBC/ODBC HiveServer2 drivers
• Concurrent Thrift clients with memory leak fixes and session/configuration information
• Kerberos authentication
• Authorization to improve GRANT/ROLE and code injection vectors
To add new Hive driver in KNIME to connect to Hive DB
1. Open KNIME.
2. Click File, Preferences, KNIME, Databases, New.
3. Add the required jar (JDBC Connector Jar).
4. Click OK.
To Test JDBC connection from KNIME using the cdata.jdbc.hive.HiveDriver
To test the database connection
1. Create a new workflow.
2. Add a Database connector node and right-click on it to configure it.
3. Select Driver as cdata.jdbc.hive.HiveDriver.
4. Enter the Database URL:
jdbc:hive:Server=<host_url>;Port=10000;TransportMode=BINARY
5. Enter the username and password.
6. Click OK and execute the workflow to check the connection.