Hadoop Versions and Distributions
Hadoop is an ecosystem of software providing various services related to distributed processing of data. One of these core services is HDFS ,which is a scalable and fault-tolerant distributed file system. If you want to run DataFlow in a distributed cluster, then we recommend that you use a distributed file system such as HDFS.
The following distributions and versions of Hadoop are supported for use with DataFlow:
• Apache Hadoop 2.2 and 3.1
• HortonWorks distribution (HDP) version 2.3 to 3.1.1
• Cloudera’s distribution (CDH) version 5.x up to version 6.3
HBase Version
DataFlow provides both a reader and writer for accessing
HBase, a scalable database built using Hadoop. The HBase support in DataFlow works with:
• Apache HBase distributed with CDH version 5.x and later
• Hortonworks HBase distributed with HDP version 2.0 and later
Hive Version
DataFlow the follwoing readers and writers for
Hive: ORCReader, ORCWriter, and ParquetReader operator.
Last modified date: 01/06/2023