Was this helpful?
Hadoop Versions and Distributions
Hadoop is an ecosystem of software providing various services related to distributed processing of data. One of these core services is HDFS ,which is a scalable and fault-tolerant distributed file system. If you want to run DataFlow in a distributed cluster, then we recommend that you use a distributed file system such as HDFS.
The following distributions and versions of Hadoop are supported for use with DataFlow:
Apache Hadoop 2.2 and 3.1
HortonWorks distribution (HDP) version 2.3 to 3.1.1
Cloudera’s distribution (CDH) version 5.x up to version 6.3
HBase Version
DataFlow provides both a reader and writer for accessing HBase, a scalable database built using Hadoop. The HBase support in DataFlow works with:
Apache HBase distributed with CDH version 5.x and later
Hortonworks HBase distributed with HDP version 2.0 and later
Note:  For more information about supported Hadoop and HBase distributions, see Hadoop Module Configurations.
Hive Version
DataFlow the follwoing readers and writers for Hive: ORCReader, ORCWriter, and ParquetReader operator.
Last modified date: 01/06/2023