Building DataFlow Applications Using RushScript : Additional Scripting Features : Accessing Record Data Directly
 
Share this page                  
Accessing Record Data Directly
At times you may want to directly access data processed within a graph. The RushScript environment supplies functions that allow injecting collectors into a graph. As the graph is executing, the injected collectors gather the data into memory. After the graph execution is complete, the collectors can be used to access the gathered data.
Important!  Collectors injected into a graph to gather data into memory. Be careful where collectors are injected. Using them to gather large amounts of memory may lead to out-of-memory problems.
The following example injects a collector at the tail end of a Group operator (see Using the Group Operator to Compute Aggregations) that performs data aggregation. The aggregation will produce only one output row since no key fields are specified.
The data type returned by a collector is a native JavaScript array of Java Map types. Each row of output can be accessed using array index notation, starting at index 0. Each field within a row is accessed using the get() method with the field name wanted. The data is converted into JavaScript types where applicable. To access multiple rows of data, simply iterate through the rows within the JavaScript array.
Using a data collector
// Import System to use System.out.println
importClass(java.lang.System);

// Define ratings schema
var ratingschema = dr.schema()
    .nullable(true)
    .trimmed(true)
    .INT("userID")
    .INT("movieID")
    .DOUBLE("rating")
    .INT("timestamp");

// Read the ratings
var ratings = dr.readDelimitedText({source:'data/ratings.txt', schema:ratingschema, fieldSeparator:"::", header:true});

// Count and average the ratings
var aggs =
    'count(rating) as "count", ' +
    'min(rating) as "min", ' +
    'max(rating) as "max", ' +
    'avg(rating) as "avg", ' +
    'stddev(rating) as "stddev"';

// No-key fields specified
var results = dr.group(ratings, {aggregations:aggs});

// Plug in a collector for the aggregation results
var collector = results.collectResults();

// Execute the graph. Accessing the collector before executing the graph causes an error.
dr.execute("group-ratings");

// Access the data gathered by the collector.
// Validate the expected rows and print out the expected values
var data = collector.getData();
if (data.length != 1) {
    throw "Expected only 1 result data row but found " + data.length;
}

// Access the "count" field of the first (and only) data row
var count = data[0].get("count");
System.out.println("count = " + count);

// Access the "min" field of the first (and only) data row
var min = data[0].get("min");
System.out.println("min = " + min);

// Access the "max" field of the first (and only) data row
var max = data[0].get("max");
System.out.println("max = " + max);

// Access the "avg" field of the first (and only) data row
var avg = data[0].get("avg");
System.out.println("avg = " + avg);
Here is the output from executing the script. The println() statements from the script are contained in the output.
count = 1000
min = 1.0
max = 5.0
avg = 3.777