Building DataFlow Applications Using RushScript : Additional Scripting Features : Creating Join Keys
 
Share this page                  
Creating Join Keys
Using the Join Operator to Do Standard Relational Joins relationally joins two input data sets to produce a single output. The join is done by key fields whose names must be specified. Two sets of keys are passed into Join. If the two sets of join key field names are not the same, the interface to use the operator is hard to use within RushScript.
A convenience function is added to the dr variable to allow creating join keys. The following code example shows how to pass in two arrays of join keys that can be passed into the Join operator. The code uses the dr.makeJoinKeys() function to create the needed join keys. The function takes two parameters: an array of the left side key field names, an array of the right side key field names. Note that in the example, even though each array contains only one element, the array notation must still be used.
Example using the dr.joinKeys function
// Join two data sources and write the results
var movieschema = dr.schema()
    .nullable(true)
    .trimmed(true)
    .INT("m_movieID")
    .STRING("m_movieName")
    .STRING("m_genre");

var ratingschema = dr.schema()
    .nullable(true)
    .trimmed(true)
    .INT("r_userID")
    .INT("r_movieID")
    .DOUBLE("r_rating")
    .INT("r_timestamp");

// Read the movies
var movies = dr.readDelimitedText({source:'data/movies.txt', schema:movieschema, fieldSeparator:"::", recordSeparator:"\n"});

// Read the ratings
var ratings = dr.readDelimitedText({source:'data/ratings.txt', schema:ratingschema, fieldSeparator:"::", recordSeparator:"\n", header:true});

// Define the key fields to use for the join
var keys = dr.makeJoinKeys(['m_movieID'], ['r_movieID']);

// Note that the two datasets are passed in to the join operator
// as the first two parameters. They are order dependent.
var results = dr.join(movies, ratings, {joinMode:JoinMode.INNER, joinKeys:keys, mergeLeftAndRightKeys:true});

// Sort the results
results = dr.sort(results, {sortKeys:['m_movieID','r_userID']});

// Write the results to a delimited text file.
dr.writeDelimitedText(results, {target:"results/movieratings-join.txt", mode:WriteMode.OVERWRITE, header:true, fieldDelimiter:"", writeSingleSink:true});

// Execute the graph
dr.execute("join test");