Supported Compression Formats

Concepts to Know > Concepts to Know > Supported Files, Formats, and Text Types > Supported Files and Formats > Supported Compression Formats

Was this helpful?

DataFlow is shipped with a number of compression formats available by default. You can use any of the formats to read or write files either by explicitly declaring their use or automatically detecting them based on the file suffix. Use the CompressionFormats factory class to obtain a specific CompressionFormat implementation by applying the appropriate format identifier.

Obtaining a CompressionFormat

CompressionFormat format = CompressionFormats.lookupFormat("gzip");

When reading the files, DataFlow segments the data into “splits.” Each file split is independently read, parsed, and processed. Splitting files in this manner allows I/O to be distributed and to enhance the performance to distributed applications. A few compression formats support splitting the compressed data into segments for distributed reading. Also, a few formats require the complete file to be processed for the decompression operation to succeed.

The following sections provide information about the compression format and support for distributed read operations (split).

gzip

gzip is a noted public domain compression format. The JDK provides direct support for gzip.

Implementation Class	com.pervasive.dataflow.io.compression.GZipCompression
Format Identifier	gzip
Recognized suffixes	.gz, .z
Split Support	No

bzip2

bzip2 is a noted public domain compression format. Typically, it allows a larger degree of compression than other formats. However, it is also slow at compression and decompression when compared to other formats.

Implementation Class	com.pervasive.dataflow.io.compression.BZipCompression
Format Identifier	bzip2
Recognized suffixes	.bz2, .bz
Split Support	Yes

snappy

Snappy is a public domain compression format designed to achieve a modest level of compression at high speeds.

Implementation Class	com.pervasive.dataflow.io.compression.SnappyCompression
Format Identifier	snappy
Recognized suffixes	.snz
Split Support	No

Last modified date: 01/03/2025