Overview

Concepts to Know : Introduction to DataFlow : Overview

Share this page

Overview

Actian DataFlow is an end-to-end framework for data preparation, analytics development, and execution. You can use it standalone or as part of a platform.

Benefits of DataFlow include:

• Providing data preparation, profiling, de-duping, enhancement, aggregation, and Extract, Transform, and Load (ETL) in a visual drag, drop, and configure interface (KNIME).

• Allowing built-in connections to flat files, spreadsheets, SQL databases, NoSQL databases, Hadoop Distributed File Systems (HDFS), Amazon and local file system, Azure Blob Storage, HBase, and easy extensibility to all the sources in your enterprise.

• Designing workflows and scaling them from a single workstation to a high-powered server such as a Hadoop cluster. The design environment includes an API (Java SDK), DataFlow Scripting (Java scripting), and KNIME.

• Parallelizing and optimizing the data processing jobs automatically using the patented engine. This allows using the available processing power on machines or clusters of any size to full capacity and provides record-setting processing speed without coding. Also, it processes data natively on Hadoop.

• Providing capability to manipulate extensive data by using profile, group, join, lookup, filter, fill in missing values, sort, de-dupe also on fuzzy matches, extract by time ranges, find substrings, and so on.

Before going into depth on the working concepts within the Actian DataFlow product, this section introduces the general foundation on which it is built.