DataFlow Release Summary : Release Notes
 
Share this page                  
Release Notes
 
What’s New in This Release
System Requirements
Known Issues
Related Documentation
Actian DataFlow provides a visual end-to-end solution for data preparation, analytics development, and execution on Hadoop. An embedded dataflow engine delivers auto-scaling and parallelism, processing data natively on Apache Hadoop.
Actian DataFlow offers a robust range of capabilities for everyone from business analysts to data scientists:
Create workflows to read and pull together data from all your sources
Use prebuilt data preparation and analytics operators already optimized for parallel execution
Prepare the data, analyze it, and output it to the visualization tool of your choice
Build end-to-end analytics workflows in KNIME (Konstanz Information Miner)
What’s New in This Release
This section provides information about the new features in the following DataFlow builds.
For information about resolved issues in this version of DataFlow, see Change Log.
New Features in 6.9.0 Build
The 6.9.0 build includes the following new features:
Support for multiple SQL statements in Initialize and Finalize properties on various operators
Added raw binary file writer operator WriteBinary
Added additional support for object tokens types in KNIME
Support for Azure Blob Storage file system and ABFS paths
Added new functions to support selected Object types
View job logs on secure Hadoop history server
New Features in 6.8.0 Build
The 6.8.0 build includes the following new features:
Support for Hive 3.1.2
Support for basic JSON file
Added new ReadJSON operator
Added COPY VWLOAD command for LoadActianVector
Support for s3 schemes such as s3a and s3n
Support for virtual channels, scale conversions, or compressed data blocks features in ReadMDF operator
Added Run Mode for MDF Reader
Support for JSONReader KNIME node
New Features in 6.7.0 Build
The 6.7.0 build includes the following new features:
Integrates with Eclipse 4.7 (Neon) and supports KNIME version 3.7.
Support for Hive 1.x, 2.x
Support for HBase 1.1.2, 2.0.0
Added new ReadMDF operator
Added new LoadActianVector operator that includes the functionality of all previous Load Vector operators
Deprecated operators LoadVectorwise, LoadVectorOnHadoop, and LoadVectorOnHadoopDirect
New Features in 6.6.1-17 Build
The 6.6.1-17 build includes the following new features:
Support for new data types including Money, Duration, Period, Inet4Address, Inet6Address
Updated support for latest Amazon Web Services version
Support for logical HDFS paths when High Availability is enabled
Additional features added to Load Vector On Hadoop Direct KNIME node
Support for Apache Hadoop 3.0.1
Support for MapR 6.0.1
Updated support for latest versions of MapR 5, HDP, CDH, and Apache Hadoop
New Features in 6.6.0-111 Build
The 6.6.0-111 build includes the following new features.
New Feature
Description
Actian Vector 5.0
Supports Load Actian Vector On Hadoop Direct DataFlow node.
KNIME 3.1.2
Site installation is updated for DataFlow to package with KNIME 3.1.2.
New Features in 6.6.0-108 Build
The 6.6.0-108 build includes the following new features, enhancements, and bug fixes.
DataFlow supports the following versions of auxiliary software:
MapR 5.1 and later
Java version 8
CentOS or RHEL version 7
Hortonworks Data Platform (HDP) 2.4
Cloudera CDH 5.7
System Requirements
This release of Actian DataFlow runs on the following platforms:
Microsoft Windows
Linux
For more information, see System Requirements in Installing and Configuring DataFlow. This section provides information about minimum hardware performance requirements and the necessary versions of Java, Hadoop, and HBase used to create a complete DataFlow configuration.
Third-party Software
Actian DataFlow version 6.6 integrates with Eclipse 4.5 (Mars).
Actian DataFlow version 6.3.2 and earlier supports KNIME version 2.9 and earlier.
Actian DataFlow version 6.4 and 6.5 supports KNIME version 2.11.3 and earlier.
Actian DataFlow version 6.6 is packaged with KNIME version 3.1.2.
Upgrading DataFlow and Interfaces
For installation and configuration instructions, see the guide Installing and Configuring DataFlow.
For upgrading and uninstallation instructions, see Upgrading DataFlow and Interfaces and Uninstalling DataFlow and Interfaces.
Known Issues
This section describes the following known issues:
DataFlow Known Issues
KNIME Known Issues
Known Issues During Integration with Kerberos-Enabled Hadoop
DataFlow Known Issues
The following information describes selected known issues in the current release.
Issue
Description
Resolution
Cannot support Apache Hadoop 3.0.0—release has known issues and has been deprecated by Apache.
Use Hadoop 3.0.1.
Cannot support MapR 6.0.0—release has issues with inconsistent native libraries.
Use MapR 6.0.1.
DR-3707
Cannot read Parquet file from S3 location
Issue will be fixed in a future release
DR-3015
When you run a workflow having ORC writer and the stripe size modified to a value that is not a multiple of 1024, it fails with an error.
This issue will be fixed in a future release.
DR-2989
When you run a YARN job on Vector using the SQL Copy load method, the job fails.
Run the job using the Direct Load method.
DR-2925
When you access files in a MapR instance of HDFS using KNIME with DataFlow extensions on Windows, the KNIME process fails. This failure is indicated by an error in a shared library which causes the KNIME JVM to exit unexpectedly.
This issue will be fixed in a future release.
DR-2897
When you configure KNIME for Kerberos-enabled HDFS on a Windows client, KNIME fails to start because the operator libraries are not loaded.
This issue will be fixed in a future release.
DR-2534
Wrapping the DataFlow Java application as a web service has problems with external .jar files in WEB-INF\lib.
Copy the required .jar files to the Tomcat lib folder.
DR-2486
A deadlock may occur when you execute a DataFlow job within YARN. The deadlock is caused by the number of resources for worker containers matching the capacity of the cluster.
Lower the resource requirements for the job to less than the capacity of the cluster.
DR-2484
Attempting to specify multiple .jar files on the dr command line using the -cp option causes a failure and writing of nonsense characters to the terminal.
Place required .jar files in the lib folder of the DataFlow installation.
DR-2482
When viewing the YARN Resource Manager job page, the URL links for running DataFlow jobs do not work correctly. An error is displayed about POST messages not being supported.
The DataFlow cluster manager supports displaying detailed run-time metrics about DataFlow jobs.
KNIME Known Issues
The following table describes known issues and their resolution.
Issue
Description
Resolution
N/A
When creating a new DataFlow workflow from KNIME on Linux, KNIME might hang. This is a KNIME issue that occurs when the printing service, Common UNIX Printing System (CUPS) is not running. For information about this issue, see KNIME forum.
Start the CUPS service using the service cups start command.
DR-3321
DataFlow jobs that are executed on an YARN-enabled Hadoop cluster with HA and Kerberos enabled fail with an error. This occurs when KNIME version 2.11.3 with Java version 1.7.0-60 is used on a Hadoop cluster.
Do any of the following:
Export the workflow as a JSON file from KNIME and execute using DR command line script. The script accepts JAVA_HOME, which matches the Hadoop cluster and therefore has the same security libraries.
Manually override the Java that is used by KNIME when the correct Java version (1.7.0-67) is available on the system. This requires you to edit the knime.ini file to manually set the VM to use the correct version and restart KNIME. For example, see Code example.
When Java is not installed on the client system, replace the security folder that is available at KNIME_HOME/jre/lib/security with the version included in the Java instance that is used to run the Hadoop version 2.2 cluster. After replacing the folder, restart KNIME, and the security will match the cluster.
Code example
The following is a code example of the resolution for KNIME known issue DR-3321.
-startup
plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.100.v20110505
*-vm
/usr/jdk64/jdk1.7.0_67/bin/java*
-vmargs
-Dknime.swt.disableEventLoopWorkaround=true
-XX:MaxPermSize=1024m
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass
-Dknime.enable.fastload=true
-Dorg.eclipse.swt.internal.gtk.cairoGraphics=false
-Dorg.eclipse.swt.internal.gtk.useCairo=false
-Xmx2048m
Known Issues During Integration with Kerberos-Enabled Hadoop
You may encounter the following issues with JRE and KNIME during integration with a Kerberos-enabled Hadoop instance. Issue resolutions are provided.
Issue with JRE
A few JRE instances support the highest level of encryption that may be required by Kerberos. When Kerberos is installed and configured, the level of encryption is specified. If the highest level of encryption is selected for Kerberos, you may need to update the JRE instances used by Hadoop and DataFlow on the Hadoop cluster.
The OpenJDK JRE supports the highest level of encryption as installed. You may need to update the Oracle versions of JRE to support higher levels of encryption.
Resolution
Do any of the following:
Ensure that the JCE unlimited strength encryption .jar files are installed. To upgrade the JCE unlimited strength, download it from Oracle and follow the installation instructions.
Limit the encryption methods used by Kerberos.
If you reconfigure Kerberos to limit the supported encryption methods, you must recreate the principals and keytab files that are required by Hadoop.
Issues with Hadoop Clients
Issue 1
The Hadoop configuration is required on all client systems accessing a Hadoop cluster. When Kerberos is enabled, certain configuration files such as core-site.xml, hdfs-site.xml, and yarn-site.xml in Hadoop are updated. These configuration files are required on the client to ensure that Kerberos is used appropriately and the required Kerberos principals are utilized.
Resolution
You must configure the following:
KNIME client:
Ensure that the hadoop.conf.dir environment variable is set and is referencing the Hadoop configuration directory (typically /etc/hadoop/conf). The DataFlow Cluster Manager uses the hadoop.conf.dir environment variable to ensure that the proper configuration is accessed and provided to DataFlow clients.
You can also edit the KNIME.ini file and include -Dhadoop.conf.dir=<path to configuration directory>.
Command line client:
Create a Hadoop home directory (<HADOOP>). For example: /opt/hadoop.
Create a <HADOOP>/etc/hadoop directory.
Copy the Hadoop configuration files from the Hadoop cluster to the local <HADOOP>/etc/hadoop location.
Set the HADOOP_HOME environment variable to the <HADOOP> directory. For example: set HADOOP_HOME=/opt/hadoop.
Issue 2
The JRE embedded within KNIME does not support the highest level of encryption used by Kerberos. When you access Kerberos, exceptions occurs because the Kerberos client is unable to generate encrypted Kerberos tickets.
Note:  The standard JDK or JRE installation from Oracle does not include the encryption levels required for Kerberos.
Resolution
Do any of the following:
Update the JRE embedded within KNIME to use the unlimited strength JCE .jar files. To upgrade, download the JRE .jar files from Oracle. The JRE embedded within KNIME is located in the directory <KNIME_INSTALL>/jre. Be sure to follow the directions on the Oracle site to install the JCE extension jar files.
Limit the encryption algorithms used in Kerberos to the ones supported in the standard JDK or JRE installation.
Note:  If you reconfigure Kerberos to limit the encryption methods that are supported, then you must recreate the principals and keytab files that Hadoop requires.
Related Documentation
Actian DataFlow documentation is provided in several locations.
Searchable online help on docs.actian.com. This site is updated as needed between releases, so it always has the latest information.
PDF download from the Actian Electronic Software Distribution website. On this site, see the Documentation link at lower left. PDF files are posted at release time for users who prefer this format.
In the KNIME user interface, detailed information for a selected node is displayed in the Node Description tab. As with PDF, this content is part of the release package.