Using the dr.include() Function
At times you may want to separate commonly used functionality from a script to allow it to be reused by multiple scripts. You can do this within RushScript by isolating the common code into a single source file. The common source file can then be included into the source files where you want to use the common functionality.
The files to include are interpreted as relative paths. These relative paths are resolved using the following search paths, in order:
1. Path of the RushScript file currently being evaluated
2. List of include directories specified at run time
3. Class path
To demonstrate the usage of the
dr.include() function, a sample application will be built that uses
TPC-H data. TPC-H is a standard benchmark defined by the Transaction Processing Performance Council (TPC). The application will use the
lineitem data defined by TPC-H. A common functions script will contain the schema definition and a function for building a reader of the
lineitem data. Assuming multiple scripts will be written to implement the TPC-H queries, this common set of functions will be useful to reduce redundant code.
The layout of the files for this application is as shown below:
• queries
– tpch-q1.js
• common
– tpch-functions.js
– tpch-schemas.js
The following code blocks contain the contents of the tpch-q1.js file. It implements Query 1 defined by the TPC-H standard. This query reads the lineitem data and performs a series of filters, calculations, and aggregations on it. The tpch-functions.js file is included to access the functions for reading the lineitem data and dumping the results of the operation. The functions and variables defined in tpch-functions.js (and its includes) are used by this script to build the Query 1 application. Note that the include is relative to the path of the tpch-q1.js file.
tpch-q1.js
// Include the common tpch functions relative to this file
// in a directory called functions (up one directory level).
dr.include(../common/'tpch-functions.js');
// Invoke function to read the lineitem data, limiting the desired fields
var lineitems = readLineItems('data/lineitem.tbl', ['l_returnflag', 'l_linestatus', 'l_quantity', 'l_extendedprice', 'l_discount', 'l_tax', 'l_shipdate']);
// Filter by ship date
lineitems = dr.filterRows(lineitems, {predicate : 'l_shipdate <= STR_TO_DATE("1998-09-02")'}).output;
// Drop unneeded fields
lineitems = dr.removeFields(lineitems, {fieldNames:['l_shipdate']});
// Build calculations for discounted price and total price
var discountedPrice = Arithmetic.mult("l_extendedprice", Arithmetic.sub(1, "l_discount"));
var totalPrice = Arithmetic.mult(discountedPrice, Arithmetic.add(1, "l_tax"));
var deriveDiscountedPrice = FieldDerivation.derive("discountedPrice", discountedPrice);
var deriveTotalPrice = FieldDerivation.derive("totalPrice", totalPrice);
lineitems = dr.deriveFields(lineitems, {derivedFields:[deriveDiscountedPrice, deriveTotalPrice]});
// Drop unneeded fields
lineitems = dr.removeFields(lineitems, {fieldNames:['l_tax']});
// Define the aggregations wanted
var aggs = 'sum(l_quantity) as sum_qty, ' +
'sum(l_extendedprice) as sum_base_price, ' +
'sum(discountedPrice) as sum_disc_price, ' +
'sum(totalPrice) as sum_charge, ' +
'avg(l_quantity) as avg_qty, ' +
'avg(l_extendedprice) as avg_price, ' +
'avg(l_discount) as avg_disc, ' +
'count(l_discount) as count_order';
// Use the aggregations keyed by return flag and line status
var group = dr.group(lineitems, {keys:['l_returnflag','l_linestatus'],aggregations:aggs});
// Sort results of aggregation
var results = dr.sort(group, {sortKeys:['l_returnflag','l_linestatus']});
// Dump results of aggregation
dumpResults(results, 'results/query1.output');
The contents of the tpch-functions.js file is displayed as follows. Note that this file includes the tpch-schemas.js file. The include is relative to the location of the tpch-functions.js file. They are in the same directory.
tpch-functions.js
// Include the schemas.js file relative to this file
dr.include('tpch-schemas.js');
// Create a reader for the lineitem data using only the desired fields.
// Return the result set.
function readLineItems(filePath, fields) {
return dr.readDelimitedText({
source:filePath,
schema:lineitemschema,
fieldSeparator:"|",
extraFieldAction:ParseErrorAction.IGNORE,
selectedFields:fields});
}
// Create a writer for results
function dumpResults(data, file) {
dr.writeDelimitedText(data, {target:file, header:true, mode:WriteMode.OVERWRITE, fieldDelimiter:"", writeSingleSink:true});
}
The contents of the tpch-schemas.js file is shown below.
tpch-schemas.js
// Define the TPC-H lineitem schema
var lineitemschema = dr.schema()
.INT("l_orderkey")
.INT("l_partkey")
.INT("l_suppkey")
.INT("l_linenumber")
.DOUBLE("l_quantity")
.DOUBLE("l_extendedprice")
.DOUBLE("l_discount")
.DOUBLE("l_tax")
.STRING("l_returnflag")
.STRING("l_linestatus")
.DATE("l_shipdate")
.DATE("l_commitdate")
.DATE("l_receiptdate")
.STRING("l_shipinstruct")
.STRING("l_shipmode")
.STRING("l_comment");