Joining Data
The following script reads two data sources, joins the data together, and then summarizes the data values producing a stats model. A few noteworthy items:
• The dr.schema() function is used to create schemas used by the readers. A schema can also be loaded from a local file using the load() function of the schema.
• The dr.makeJoinKeys() function is used to create the join keys since the names of the key fields for the left and right hand sides differ.
• The join mode is set to LEFT_OUTER to perform a left outer join. Note the use of the join mode as a string type. It could also have been specified as JoinMode.LEFT_OUTER.
• Summary statistics are generated from the joined data. The statistics model is output in PMML and is persisted to a local file.
RushScript Example: Joining Data
// Define rating schema
var ratingschema = dr.schema()
.nullable(true)
.trimmed(true)
.INT("r_userID")
.INT("r_movieID")
.DOUBLE("r_rating").INT("r_timestamp");
// Define movie schema
var movieschema = dr.schema()
.nullable(true)
.trimmed(true)
.INT('m_movieID')
.STRING('m_movieName')
.STRING('m_genre');
// Read ratings
var ratings = dr.readDelimitedText({source:'input/ratings.txt', schema:ratingschema, fieldSeparator:"::", header:true});
// Read movies
var movies = dr.readDelimitedText({source:'input/movies.txt', schema:movieschema, fieldSeparator:"::", header:true});
// Create keys for join
var keys = dr.makeJoinKeys(['r_movieID'], ['m_movieID']);
// Use left outer join in case any movie definitions are missing
var results = dr.join(ratings, movies, {joinKeys:keys, joinMode:'LEFT_OUTER', mergeLeftAndRightKeys:true});
// Run summary statistics on the combined data
var model = dr.summaryStatistics(results, {detailLevel:DetailLevel.MULTI_PASS});
// Store the stats model in PMML form
dr.writePMML(model, {targetPathName:'output/summary-pmml.xml'});