Release Notes
Actian DataFlow provides a visual end-to-end solution for data preparation, analytics development, and execution on Hadoop. An embedded dataflow engine delivers auto-scaling and parallelism, processing data natively on Apache Hadoop.
Actian DataFlow offers a robust range of capabilities for everyone from business analysts to data scientists:
• Create workflows to read and pull together data from all your sources
• Use prebuilt data preparation and analytics operators already optimized for parallel execution
• Prepare the data, analyze it, and output it to the visualization tool of your choice
• Build end-to-end analytics workflows in KNIME (Konstanz Information Miner)
What’s New in This Release
This section provides information about the new features in the following DataFlow builds.
For information about resolved issues in this version of DataFlow, see
Change Log.
New Features in 8.0.1 Build
The 8.0.1 build includes the following new features:
• Support for setting custom endpoint for s3 remote filesystems.
• Updated support for Google Cloud Storage filesystem
New Features in 8.0.0 Build
The 8.0.0 build includes the following new features:
• Integrates with Eclipse 4.19.0 (2021-03) and supports KNIME version 4.4.0
• Support for Java 11
• Additional support for UUIDs
• Updated DataFlow plugins for Eclipse IDE and KNIME SDK
New Features in 7.0.2 Build
The 7.0.2 build includes the following new features:
• Integrates with Eclipse 4.15.0 and supports KNIME version 4.3.2
• Support for Google Cloud Storage file system
New Features in 7.0.1 Build
The 7.0.1 build includes the following new features:
• Support for reading ORC and Parquet files from Azure ABFS
• Support for writing ORC files into AWS S3A and Azure ABFS
New Features in 7.0.0 Build
The 7.0.0 build includes the following new features:
• Integrates with Eclipse 4.10.0 and supports KNIME version 4.1.3
• Support for new versions of Hadoop
• Added new Binary Writer node
• Supports writing to Avalanche on Azure
• Support for reading ORC and Parquet files from S3
New Features in 6.9.0 Build
The 6.9.0 build includes the following new features:
• Support for multiple SQL statements in Initialize and Finalize properties on various operators
• Added raw binary file writer operator WriteBinary
• Added additional support for object tokens types in KNIME
• Support for Azure Blob Storage file system and ABFS paths
• Added new functions to support selected Object types
• View job logs on secure Hadoop history server
New Features in 6.8.0 Build
The 6.8.0 build includes the following new features:
• Support for Hive 3.1.2
• Support for basic JSON file
• Added new ReadJSON operator
• Added COPY VWLOAD command for LoadActianVector
• Support for s3 schemes such as s3a and s3n
• Support for virtual channels, scale conversions, or compressed data blocks features in ReadMDF operator
• Added Run Mode for MDF Reader
• Support for JSONReader KNIME node
New Features in 6.7.0 Build
The 6.7.0 build includes the following new features:
• Integrates with Eclipse 4.7 (Neon) and supports KNIME version 3.7.
• Support for Hive 1.x, 2.x
• Support for HBase 1.1.2, 2.0.0
• Added new ReadMDF operator
• Added new LoadActianVector operator that includes the functionality of all previous Load Vector operators
• Deprecated operators LoadVectorwise, LoadVectorOnHadoop, and LoadVectorOnHadoopDirect
New Features in 6.6.1-17 Build
The 6.6.1-17 build includes the following new features:
• Support for new data types including Money, Duration, Period, Inet4Address, Inet6Address
• Updated support for latest Amazon Web Services version
• Support for logical HDFS paths when High Availability is enabled
• Additional features added to Load Vector On Hadoop Direct KNIME node
• Support for Apache Hadoop 3.0.1
• Support for MapR 6.0.1
• Updated support for latest versions of MapR 5, HDP, CDH, and Apache Hadoop
New Features in 6.6.0-111 Build
The 6.6.0-111 build includes the following new features.
New Features in 6.6.0-108 Build
The 6.6.0-108 build includes the following new features, enhancements, and bug fixes.
DataFlow supports the following versions of auxiliary software:
• MapR 5.1 and later
• Java version 8
• CentOS or RHEL version 7
• Hortonworks Data Platform (HDP) 2.4
• Cloudera CDH 5.7
System Requirements
This release of Actian DataFlow runs on the following platforms:
• Microsoft Windows
• Linux
For more information, see
System Requirements in
Installing and Configuring DataFlow. This section provides information about minimum hardware performance requirements and the necessary versions of Java, Hadoop, and HBase used to create a complete DataFlow configuration.
Third-party Software
• Actian DataFlow version 8.0.0 integrates with Eclipse 4.19 (2021-03)
• Actian DataFlow version 8.0.0 is packaged with KNIME version 4.4.0
• Actian DataFlow version 7.0.2 integrates with Eclipse 4.15 (2020-03)
• Actian DataFlow version 7.0.2 is packaged with KNIME version 4.3.2
• Actian DataFlow version 7.0 integrates with Eclipse 4.10 (2018-12)
• Actian DataFlow version 7.0 is packaged with KNIME version 4.1.3
• Actian DataFlow version 6.7 is packaged with KNIME version 3.7.1
• Actian DataFlow version 6.6 integrates with Eclipse 4.5 (Mars)
• Actian DataFlow version 6.6 is packaged with KNIME version 3.1.2
• Actian DataFlow version 6.4 and 6.5 supports KNIME version 2.11.3 and earlier
• Actian DataFlow version 6.3.2 and earlier supports KNIME version 2.9 and earlier
Upgrading DataFlow and Interfaces
For installation and configuration instructions, see the guide Installing and Configuring DataFlow.
Known Issues
This section describes the following known issues:
DataFlow Known Issues
The following information describes selected known issues in the current release.
KNIME Known Issues
The following table describes known issues and their resolution.
Code example
The following is a code example of the resolution for KNIME known issue DR-3321.
-startup
plugins/org.eclipse.equinox.launcher_1.2.0.v20110502.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.100.v20110505
*-vm
/usr/jdk64/jdk1.7.0_67/bin/java*
-vmargs
-Dknime.swt.disableEventLoopWorkaround=true
-XX:MaxPermSize=1024m
-server
-Dsun.java2d.d3d=false
-Dosgi.classloader.lock=classname
-XX:+UnlockDiagnosticVMOptions
-XX:+UnsyncloadClass
-Dknime.enable.fastload=true
-Dorg.eclipse.swt.internal.gtk.cairoGraphics=false
-Dorg.eclipse.swt.internal.gtk.useCairo=false
-Xmx2048m
Known Issues During Integration with Kerberos-Enabled Hadoop
You may encounter the following issues with JRE and KNIME during integration with a Kerberos-enabled Hadoop instance. Issue resolutions are provided.
Issue with JRE
A few JRE instances support the highest level of encryption that may be required by Kerberos. When Kerberos is installed and configured, the level of encryption is specified. If the highest level of encryption is selected for Kerberos, you may need to update the JRE instances used by Hadoop and DataFlow on the Hadoop cluster.
The OpenJDK JRE supports the highest level of encryption as installed. You may need to update the Oracle versions of JRE to support higher levels of encryption.
Resolution
Do any of the following:
• Ensure that the JCE unlimited strength encryption .jar files are installed. To upgrade the JCE unlimited strength, download it from
Oracle and follow the installation instructions.
• Limit the encryption methods used by Kerberos.
• If you reconfigure Kerberos to limit the supported encryption methods, you must recreate the principals and keytab files that are required by Hadoop.
Issues with Hadoop Clients
Issue 1
The Hadoop configuration is required on all client systems accessing a Hadoop cluster. When Kerberos is enabled, certain configuration files such as core-site.xml, hdfs-site.xml, and yarn-site.xml in Hadoop are updated. These configuration files are required on the client to ensure that Kerberos is used appropriately and the required Kerberos principals are utilized.
Resolution
You must configure the following:
• KNIME client:
– Ensure that the hadoop.conf.dir environment variable is set and is referencing the Hadoop configuration directory (typically /etc/hadoop/conf). The DataFlow Cluster Manager uses the hadoop.conf.dir environment variable to ensure that the proper configuration is accessed and provided to DataFlow clients.
– You can also edit the KNIME.ini file and include -Dhadoop.conf.dir=<path to configuration directory>.
• Command line client:
– Create a Hadoop home directory (<HADOOP>). For example: /opt/hadoop.
– Create a <HADOOP>/etc/hadoop directory.
– Copy the Hadoop configuration files from the Hadoop cluster to the local <HADOOP>/etc/hadoop location.
– Set the HADOOP_HOME environment variable to the <HADOOP> directory. For example: set HADOOP_HOME=/opt/hadoop.
Issue 2
The JRE embedded within KNIME does not support the highest level of encryption used by Kerberos. When you access Kerberos, exceptions occurs because the Kerberos client is unable to generate encrypted Kerberos tickets.
Note: The standard JDK or JRE installation from Oracle does not include the encryption levels required for Kerberos.
Resolution
Do any of the following:
• Update the JRE embedded within KNIME to use the unlimited strength JCE .jar files. To upgrade, download the JRE .jar files from
Oracle. The JRE embedded within KNIME is located in the directory <KNIME_INSTALL>/jre. Be sure to follow the directions on the Oracle site to install the JCE extension jar files.
• Limit the encryption algorithms used in Kerberos to the ones supported in the standard JDK or JRE installation.
Note: If you reconfigure Kerberos to limit the encryption methods that are supported, then you must recreate the principals and keytab files that Hadoop requires.
Related Documentation
Actian DataFlow documentation is provided in several locations.
• In the KNIME user interface, detailed information for a selected node is displayed in the Node Description tab. As with PDF, this content is part of the release package.