Hadoop Versions and Distributions
Hadoop is an ecosystem of software providing various services related to distributed processing of data. One of these core services is HDFS, which is a scalable and fault-tolerant distributed file system. If you want to run DataFlow in a distributed cluster, then we recommend that you use a distributed file system such as HDFS.
The following distributions and versions of Hadoop are supported for use with DataFlow:
• Apache Hadoop 3.1
• HortonWorks distribution (HDP) version 3.1.1
Hive Version
The following readers and writers for
Hive are supported for use with DataFlow:
• ORCReader
• ORCWriter
• ParquetReader operator.
Last modified date: 06/14/2024