Building DataFlow Applications : Building DataFlow Applications in Java : DataFlow Operator Library : Generating Data
 
Share this page                  
Generating Data
Generating Additional Data Fields
DataFlow provides a set of operators that can be used to generate data tokens for use within a DataFlow application.
Covered Operations
Using the GenerateConstant Operator to Generate Constants
Using the GenerateRandom Operator to Generate Random Data
Using the GenerateRepeatingCycle Operator to Generate Repeating Cycles
Using the GenerateArithmeticSequence Operator to Generate Sequences
Using the GenerateConstant Operator to Generate Constants
The GenerateConstant operator can be used to generate copies of a constant value.
Code Example
This code fragment shows how to initialize a GenerateConstant operator for your graph. The output port will provide 100 records with a single field called "afield" that contains an integer value of 1.
Using the GenerateConstant operator in Java
ScalarTokenType type = TokenTypeConstant.INT;
ScalarToken value = TokenUtils.parse(type, "1");
RecordToken rconst = new RecordToken(record(field(type, "afield")), value);
GenerateConstant constants = graph.add(new GenerateConstant(rconst, 100));
Using the GenerateConstant operator in RushScript
var token = RecordToken(IntToken(1));
var data = dr.generateConstant({rowCount:100, constant:token});
Properties
The GenerateConstant operator has the following properties.
Name
Type
Description
constant
The value to generate.
rowCount
long
The number of values to generate.
Ports
The GenerateConstant operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The generated data.
Using the GenerateRandom Operator to Generate Random Data
The GenerateRandom operator can be used to generate random data. All field types except generic and object are supported.
The generated data for each data type field does not generally cover the full range that is supported by the DataFlow system, but it does cover a range that any operator claiming to support that type should be able to handle.
boolean: Either true or false.
binary: Between 1 and 2048 random bytes, with a uniform distribution of the number of bytes.
char: ASCII characters 32-126 ("Valid Unicode" is not well defined).
date: The range of days representable by a Date, +- 2^64 milliseconds from 1970-01-01, corresponding to +- about 292 million years.
double: The range of Double excluding NaN and +-Inf. (NaN values can be generated by setting nullFraction > 0.0).
float: The range of Float, excluding NaN and +-Inf. (NaN values can be generated by setting nullFraction > 0.0).
int: The full range of Integer.
long: The full range of a Java Long.
numeric: The integer part is formed from 1 to 100 binary digits, corresponding to up to 31 decimal digits (2^100 - 1 = 1,267,650,600,228,229,401,496,703,205,375). This value is then divided by 10^scale, where the scale is 0 to 29, and made negative with a probability of 50%.
string: Zero or more random ASCII characters (see also char data type). The string length is unlimited, but the probability decays exponentially, as 0.9^min_length: 10% are empty strings, 9% are 1 character long, 8.1% are two characters long, 7.29% are 3 characters long, etc.
timestamp: The range of seconds is the range representable by a Timestamp, +- 2^64 milliseconds from 1970-01-01, corresponding to +- about 292 million years. The nanoseconds range is 0 to 999999999. The time zone offset is a whole number of minutes in the range -12:00 to +12:59 (the extra hour is for daylight savings).
Code Example
This code fragment shows how to initialize a GenerateRandom operator for your graph. The output port will provide 100 records with two fields called "name" and "age", which will be randomly filled with String and int values respectively.
Example Usage of the GenerateRandom operator in Java
RecordTokenType type = record(STRING("name"), INT("age"));
GenerateRandom randoms = graph.add(new GenerateRandom(type, 100);
The following example demonstrates using the operator in RushScript.
Using the GenerateRandom operator in RushScript
var type = dr.schema().STRING('name').INT('age');
var data = dr.generateRandom({rowCount:100, outputType:type});
Properties
The GenerateRandom operator has the following properties.
Name
Type
Description
nullProbability
double
The probability that any given generated token will be null valued. Must be between 0.0 and 1.0.
outputType
The data type of the generated values.
rowCount
long
The number of values to generate.
seed
long
The seed for the random number generator.
Ports
The GenerateRandom operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The generated data.
Using the GenerateRepeatingCycle Operator to Generate Repeating Cycles
The GenerateRepeatingCycle operator is used to generate a cycle of repeating values. First all the rows of the input are copied to the output, then all the rows of the input are copied to the output again, and so on. Output stops when the desired row count is reached.
Code Example
This code fragment shows how to initialize a GenerateRepeatingCycle operator for your graph. It will generate 10 random integers, which it will then repeatedly push to the output until 100 records have been generated.
Using the GenerateRepeatingCycle operator in Java
GenerateRandom rand = graph.add(new GenerateRandom(record(INT("number")), 10));
GenerateRepeatingCycle repeat = graph.add(new GenerateRepeatingCycle(100));
graph.connect(rand.getOutput(), repeat.getInput();
Using the GenerateRepeatingCycle operator in RushScript
var repeatData = dr.generateRepeatingCycle(data, {rowCount:100});
Properties
The GenerateRepeatingCycle has one property.
Name
Type
Description
rowCount
long
The number of values to generate.
Ports
The GenerateRepeatingCycle operator provides a single input port.
Name
Type
Get Method
Description
input
getInput()
The flow of input data that should be repeated cyclically.
The GenerateRepeatingCycle operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The generated data.
Using the GenerateArithmeticSequence Operator to Generate Sequences
The GenerateArithmeticSequence operator is used to generate a sequence of numerical values, with a constant difference between consecutive values.
Code Example
This code fragment shows how to initialize a GenerateArithmeticSequence operator for your graph. The output port will provide 100 records with a single field of type long with the value starting at 1 and incrementing by 1 on each successive record.
Using the GenerateArithmeticSequence operator in Java
GenerateArithmeticSequence seq = graph.add(new GenerateArithmeticSequence(100));
seq.setStartValue(1);
seq.setStepValue(1);
Using the GenerateArithmeticSequence operator in RushScript
var data = dr.generateArithmeticSequence({rowCount:100, startValue:1, stepSize:1});
Properties
The GenerateArithmeticSequence operator has the following properties.
Name
Type
Description
outputType
The output type of the generated sequence. Must be a numerical type. By default this is a record of a single field named "field" of type long.
rowCount
long
The number of values to generate.
startValue
BigDecimal
The value of the first token in the sequence.
stepSize
BigDecimal
The difference between consecutive tokens in the generated sequence.
Ports
The GenerateArithmeticSequence operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The generated data.