Release Notes

DataFlow Release Summary : Release Notes

Share this page

Release Notes

Actian DataFlow provides a visual end-to-end solution for data preparation, analytics development, and execution on Hadoop. An embedded dataflow engine delivers auto-scaling and parallelism, processing data natively on Apache Hadoop.

Actian DataFlow offers a robust range of capabilities for everyone from business analysts to data scientists:

• Create workflows to read and pull together data from all your sources

• Use prebuilt data preparation and analytics operators already optimized for parallel execution

• Prepare the data, analyze it, and output it to the visualization tool of your choice

• Build end-to-end analytics workflows in KNIME (Konstanz Information Miner)

What’s New in This Release

This section provides information about the new features in the following DataFlow builds.

For information about resolved issues in this version of DataFlow, see Change Log.

New Features in 6.9.0 Build

The 6.9.0 build includes the following new features:

• Support for multiple SQL statements in Initialize and Finalize properties on various operators

• Added raw binary file writer operator WriteBinary

• Added additional support for object tokens types in KNIME

• Support for Azure Blob Storage file system and ABFS paths

• Added new functions to support selected Object types

• View job logs on secure Hadoop history server

New Features in 6.8.0 Build

The 6.8.0 build includes the following new features:

• Support for Hive 3.1.2

• Support for basic JSON file

• Added new ReadJSON operator

• Added COPY VWLOAD command for LoadActianVector

• Support for s3 schemes such as s3a and s3n

• Support for virtual channels, scale conversions, or compressed data blocks features in ReadMDF operator

• Added Run Mode for MDF Reader

• Support for JSONReader KNIME node

New Features in 6.7.0 Build

The 6.7.0 build includes the following new features:

• Integrates with Eclipse 4.7 (Neon) and supports KNIME version 3.7.

• Support for Hive 1.x, 2.x

• Support for HBase 1.1.2, 2.0.0

• Added new ReadMDF operator

• Added new LoadActianVector operator that includes the functionality of all previous Load Vector operators

• Deprecated operators LoadVectorwise, LoadVectorOnHadoop, and LoadVectorOnHadoopDirect

New Features in 6.6.1-17 Build

The 6.6.1-17 build includes the following new features:

• Support for new data types including Money, Duration, Period, Inet4Address, Inet6Address

• Updated support for latest Amazon Web Services version

• Support for logical HDFS paths when High Availability is enabled

• Additional features added to Load Vector On Hadoop Direct KNIME node

• Support for Apache Hadoop 3.0.1

• Support for MapR 6.0.1

• Updated support for latest versions of MapR 5, HDP, CDH, and Apache Hadoop

New Features in 6.6.0-111 Build

The 6.6.0-111 build includes the following new features.

New Feature	Description
Actian Vector 5.0	Supports Load Actian Vector On Hadoop Direct DataFlow node.
KNIME 3.1.2	Site installation is updated for DataFlow to package with KNIME 3.1.2.

New Features in 6.6.0-108 Build

The 6.6.0-108 build includes the following new features, enhancements, and bug fixes.

DataFlow supports the following versions of auxiliary software:

• MapR 5.1 and later

• Java version 8

• CentOS or RHEL version 7

• Hortonworks Data Platform (HDP) 2.4

• Cloudera CDH 5.7

System Requirements

This release of Actian DataFlow runs on the following platforms:

• Microsoft Windows

• Linux

For more information, see System Requirements in Installing and Configuring DataFlow. This section provides information about minimum hardware performance requirements and the necessary versions of Java, Hadoop, and HBase used to create a complete DataFlow configuration.

Third-party Software

• Actian DataFlow version 6.6 integrates with Eclipse 4.5 (Mars).

• Actian DataFlow version 6.3.2 and earlier supports KNIME version 2.9 and earlier.

• Actian DataFlow version 6.4 and 6.5 supports KNIME version 2.11.3 and earlier.

• Actian DataFlow version 6.6 is packaged with KNIME version 3.1.2.

Upgrading DataFlow and Interfaces

For installation and configuration instructions, see the guide Installing and Configuring DataFlow.

For upgrading and uninstallation instructions, see Upgrading DataFlow and Interfaces and Uninstalling DataFlow and Interfaces.

Known Issues

This section describes the following known issues:

• DataFlow Known Issues

• KNIME Known Issues

• Known Issues During Integration with Kerberos-Enabled Hadoop

DataFlow Known Issues

The following information describes selected known issues in the current release.

Issue	Description	Resolution
—	Cannot support Apache Hadoop 3.0.0—release has known issues and has been deprecated by Apache.	Use Hadoop 3.0.1.
—	Cannot support MapR 6.0.0—release has issues with inconsistent native libraries.	Use MapR 6.0.1.
DR-3707	Cannot read Parquet file from S3 location	Issue will be fixed in a future release
DR-3015	When you run a workflow having ORC writer and the stripe size modified to a value that is not a multiple of 1024, it fails with an error.	This issue will be fixed in a future release.
DR-2989	When you run a YARN job on Vector using the SQL Copy load method, the job fails.	Run the job using the Direct Load method.
DR-2925	When you access files in a MapR instance of HDFS using KNIME with DataFlow extensions on Windows, the KNIME process fails. This failure is indicated by an error in a shared library which causes the KNIME JVM to exit unexpectedly.	This issue will be fixed in a future release.
DR-2897	When you configure KNIME for Kerberos-enabled HDFS on a Windows client, KNIME fails to start because the operator libraries are not loaded.	This issue will be fixed in a future release.
DR-2534	Wrapping the DataFlow Java application as a web service has problems with external .jar files in WEB-INF\lib.	Copy the required .jar files to the Tomcat lib folder.
DR-2486	A deadlock may occur when you execute a DataFlow job within YARN. The deadlock is caused by the number of resources for worker containers matching the capacity of the cluster.	Lower the resource requirements for the job to less than the capacity of the cluster.
DR-2484	Attempting to specify multiple .jar files on the dr command line using the -cp option causes a failure and writing of nonsense characters to the terminal.	Place required .jar files in the lib folder of the DataFlow installation.
DR-2482	When viewing the YARN Resource Manager job page, the URL links for running DataFlow jobs do not work correctly. An error is displayed about POST messages not being supported.	The DataFlow cluster manager supports displaying detailed run-time metrics about DataFlow jobs.

KNIME Known Issues

The following table describes known issues and their resolution.

Issue	Description	Resolution
N/A	When creating a new DataFlow workflow from KNIME on Linux, KNIME might hang. This is a KNIME issue that occurs when the printing service, Common UNIX Printing System (CUPS) is not running. For information about this issue, see KNIME forum.	Start the CUPS service using the service cups start command.
DR-3321	DataFlow jobs that are executed on an YARN-enabled Hadoop cluster with HA and Kerberos enabled fail with an error. This occurs when KNIME version 2.11.3 with Java version 1.7.0-60 is used on a Hadoop cluster.	Do any of the following: • Export the workflow as a JSON file from KNIME and execute using DR command line script. The script accepts JAVA_HOME, which matches the Hadoop cluster and therefore has the same security libraries. • Manually override the Java that is used by KNIME when the correct Java version (1.7.0-67) is available on the system. This requires you to edit the knime.ini file to manually set the VM to use the correct version and restart KNIME. For example, see Code example. • When Java is not installed on the client system, replace the security folder that is available at KNIME_HOME/jre/lib/security with the version included in the Java instance that is used to run the Hadoop version 2.2 cluster. After replacing the folder, restart KNIME, and the security will match the cluster.

Code example

The following is a code example of the resolution for KNIME known issue DR-3321.

-startup
plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.100.v20110505
*-vm
/usr/jdk64/jdk1.7.0_67/bin/java*
-vmargs
-Dknime.swt.disableEventLoopWorkaround=true
-XX:MaxPermSize=1024m
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass
-Dknime.enable.fastload=true
-Dorg.eclipse.swt.internal.gtk.cairoGraphics=false
-Dorg.eclipse.swt.internal.gtk.useCairo=false
-Xmx2048m

Known Issues During Integration with Kerberos-Enabled Hadoop

You may encounter the following issues with JRE and KNIME during integration with a Kerberos-enabled Hadoop instance. Issue resolutions are provided.

Issue with JRE

A few JRE instances support the highest level of encryption that may be required by Kerberos. When Kerberos is installed and configured, the level of encryption is specified. If the highest level of encryption is selected for Kerberos, you may need to update the JRE instances used by Hadoop and DataFlow on the Hadoop cluster.

The OpenJDK JRE supports the highest level of encryption as installed. You may need to update the Oracle versions of JRE to support higher levels of encryption.

Resolution

Do any of the following:

• Ensure that the JCE unlimited strength encryption .jar files are installed. To upgrade the JCE unlimited strength, download it from Oracle and follow the installation instructions.

• Limit the encryption methods used by Kerberos.

• If you reconfigure Kerberos to limit the supported encryption methods, you must recreate the principals and keytab files that are required by Hadoop.

Issues with Hadoop Clients

Issue 1

The Hadoop configuration is required on all client systems accessing a Hadoop cluster. When Kerberos is enabled, certain configuration files such as core-site.xml, hdfs-site.xml, and yarn-site.xml in Hadoop are updated. These configuration files are required on the client to ensure that Kerberos is used appropriately and the required Kerberos principals are utilized.

Resolution

You must configure the following:

• KNIME client:

– Ensure that the hadoop.conf.dir environment variable is set and is referencing the Hadoop configuration directory (typically /etc/hadoop/conf). The DataFlow Cluster Manager uses the hadoop.conf.dir environment variable to ensure that the proper configuration is accessed and provided to DataFlow clients.

– You can also edit the KNIME.ini file and include -Dhadoop.conf.dir=<path to configuration directory>.

• Command line client:

– Create a Hadoop home directory (<HADOOP>). For example: /opt/hadoop.

– Create a <HADOOP>/etc/hadoop directory.

– Copy the Hadoop configuration files from the Hadoop cluster to the local <HADOOP>/etc/hadoop location.

– Set the HADOOP_HOME environment variable to the <HADOOP> directory. For example: set HADOOP_HOME=/opt/hadoop.

Issue 2

The JRE embedded within KNIME does not support the highest level of encryption used by Kerberos. When you access Kerberos, exceptions occurs because the Kerberos client is unable to generate encrypted Kerberos tickets.

Note: The standard JDK or JRE installation from Oracle does not include the encryption levels required for Kerberos.

Resolution

Do any of the following:

• Update the JRE embedded within KNIME to use the unlimited strength JCE .jar files. To upgrade, download the JRE .jar files from Oracle. The JRE embedded within KNIME is located in the directory <KNIME_INSTALL>/jre. Be sure to follow the directions on the Oracle site to install the JCE extension jar files.

• Limit the encryption algorithms used in Kerberos to the ones supported in the standard JDK or JRE installation.

Note: If you reconfigure Kerberos to limit the encryption methods that are supported, then you must recreate the principals and keytab files that Hadoop requires.