Was this helpful?
Hadoop Versions and Distributions
Hadoop is an ecosystem of software providing various services related to distributed processing of data. One of these core services is HDFS, which is a scalable and fault-tolerant distributed file system. If you want to run DataFlow in a distributed cluster, then we recommend that you use a distributed file system such as HDFS.
The following distributions and versions of Hadoop are supported for use with DataFlow:
Apache Hadoop 3.1
HortonWorks distribution (HDP) version 3.1.1
Note:  For more information about supported Hadoop distributions, see Hadoop Module Configurations.
Hive Version
The following readers and writers for Hive are supported for use with DataFlow:
ORCReader
ORCWriter
ParquetReader operator.
Last modified date: 01/06/2023