Function Name | Operator Link |
---|---|
deleteFromJDBC | |
forceRecordStaging | |
loadMatrix | |
parseTextFields | |
readARFF | |
readAvro | |
readDelimitedText | |
readFixedText | |
readFromJDBC | |
readLog | |
read-- | |
readPMML | |
readSource | |
readStagingDataset | |
updateInJDBC | |
writeARFF | |
writeAvro | |
writeDelimitedText | |
writeFixedText | |
write-- | |
writePMML | |
writeSink | |
writeStagingDataset | |
writeToJDBC |
Function Name | Operator Link |
gatherHint | |
partitionHint |
Function Name | Operator Link |
---|---|
crossJoin | |
filterExistingRows | |
filterRows | |
group | |
join | |
limitRows | |
processByGroup | |
sampleRandomRows | |
sort | |
unionAll |
Function Name | Operator Link |
columnsToRows | |
deriveFields | |
discoverEnums | |
mergeFields | |
remapFields | |
removeFields | |
retainFields | |
rowsToColumns | |
selectFields | |
splitFields |
Function Name | Operator Link |
removeDuplicates | |
replaceMissingValues |
Function Name | Operator Link |
---|---|
analyzeDuplicates | |
analyzeLinks | |
clusterDuplicates | |
clusterLinks | |
discoverDuplicates | |
discoverLinks |
Function Name | Operator Link |
convertARMModel | |
fpGrowth | |
frequentItems |
Function Name | Operator Link |
kmeans |
Function Name | Operator Link |
decisionTreeLearner | |
decisionTreePredictor | |
decisionTreePruner | |
knnClassifier | |
naiveBayesLearner | |
naiveBayesPredictor | |
svmLearner | |
svmPredictor |
Function Name | Operator Link |
dataQualityAnalyzer | |
discoverDomain | |
distinctValues | |
linearRegressionLearner | |
logisticRegressionLearner | |
logisticRegressionPredictor | |
normalizeValues | |
rank | |
regressionPredictor | |
runRScript | |
runScript | |
sumOfSquares | |
summaryStatistics |
Function Name | Operator Link |
calculateNGramFrequency | |
calculateWordFrequency | |
convertTextCase | |
countTokens | |
dictionaryFilter | |
expandTextTokens | |
expandTextTokens | |
filterText | |
generateBagOfWords | |
textFrequencyFilter | |
textStemmer | |
textTokenizer |
Function Name | Operator Link |
assertEqual | |
assertEqualHash | |
assertEqualTypes | |
assertPredicate | |
assertRowCount | |
assertSorted |
Function Name | Operator Link |
collectRecords | |
getModel | |
emitRecords | |
putModel | |
logRows |
Function Name | Operator Link |
generateArithmeticSequence | |
generateConstant | |
generateRandom | |
generateRepeatingCycle |
Function Name | Input Parameters | Returns | Description |
---|---|---|---|
applicationName | String: application name | Sets the application name. This will be used as the application name for any DataFlow graphs created. | |
batchSize | int: batch size (optional) | Previous batch size | Sets the ports.batchSize engine configuration to the specified value. Returns the previous value of the ports.batchSize setting. See Engine Configuration Settings for more information. |
cluster | • String: cluster host name • int: port number | Sets the cluster specification on the current engine configuration. The next execution invocation will execute on the defined cluster if it exists. See Engine Configuration Settings for more information. The returned Cluster Specifier object can be used to set additional run-time options. | |
defineOperator | • String: operator name • String: fully qualified class name | Defines a customer operator to the scripting environment. The name of the operator must be unique and valid as a JavaScript function name (no spaces or special characters). The fully qualified class name should reference a valid Java class that implements the LogicalOperator interface. After an operator is defined, it can be used within the JavaScript environment. The operator name will be added as a function on the dr variable. | |
dumpFilePath | String: local path name (optional) | Previous path setting | Sets the dumpFilePath engine configuration to the specified value. Returns the previous value of the dumpFilePath setting. See Engine Configuration Settings for more information. |
enabledModules | String: comma separated list of modules | Sets the modules that will be enabled for the current engine configuration. This is a comma-separated list of the modules that should be enabled. For a list of the currently available modules see moduleConfiguration in Engine Configuration Settings. | |
execute | String: application name (optional) | Compiles and executes the currently composed DataFlow graph. | |
extensionPaths | Strings: extension paths | String[] (previous extension path setting) | Sets the list of extension paths to use for job execution. This option is only valid when used for job execution on a cluster. The extension paths refer to directories in shared storage. The paths are intended to contain extensions to the DataFlow environment on a cluster. Files found in the extension paths will be copied to the current directory of the containers created to run a DataFlow job on nodes within a cluster. Files that are archives (see below) are added to the class path. These file extensions indicate a file is an archive: • .tar.gz • .tar • .zip • .jar |
Jar files are copied as is into the local directory. The other archive file types are extracted into the local directory using a base directory name the same as the archive file. All archives are added to the class path of the container. Non-archive files are copied to the local directory of the container but are not added to the class path. Each of the paths must be contained in a shared, distributed storage system such as HDFS. Extension paths are only supported when executing DataFlow jobs using YARN. | |||
include | String: JavaScript file to include | Evaluates the given JavaScript files into the current environment. Including other JavaScript source allows access to variables and functions that may be commonly used. The search criteria for JavaScript files is as follows: • The directory containing the RushScript file currently being evaluated. • The list of provided include files (see command line reference) is searched in order. • The current classpath is searched for the include file. | |
makeJoinKeys | • String[]: left keys • String[]: right keys | JoinKey[] | Creates an array of JoinKey objects from the given arrays of left side field names and right side field names. The given arrays of field names should not be empty and should be equal in size. Use this function to make a set of keys for joining when the left side and right side key fields are not equal. |
maxMerge | int: maxMerge value (optional) | Previous maxMerge setting | Sets the join.maxMerge engine configuration setting. Returns the previous value of the join.maxMerge setting. See Engine Configuration Settings for more information. |
maxRetries | int: maxRetries value (optional) | Previous maxRetries setting | Sets the maxRetries engine configuration setting. Returns the previous value of the maxRetries setting. See Engine Configuration Settings for more information. |
minParallelism | int: minimumParallelism value (optional) | Previous minimumParallelism setting | Sets the minimumParallelism engine configuration setting. Returns the previous value of the minimumParallelism setting. See Engine Configuration Settings for more information. |
monitored | boolean: monitored value (optional) | Previous monitored value | Sets the monitored engine configuration setting. Returns the previous value of the monitored setting. See Engine Configuration Settings for more information. |
parallelism | int: parallelism value (optional) | Previous parallelism value | Sets the parallelism engine configuration setting. Returns the previous value of the parallelism setting. See Engine Configuration Settings for more information. |
schedulerQueue | String: name of the scheduler queue to use when executing a job | String: previous scheduler queue name setting | Sets the name of the scheduler queue to use when scheduling jobs. The scheduler queue name is only valid when using a cluster for job execution. Currently scheduler queue names are only supported when using YARN for job execution. |
schema | TextRecordBuilder | Creates a new TextRecordBuilder instance. This object can be used to define a new schema or load a previously defined schema. | |
sizeByReaders | boolean: sizeByReaders value (optional) | Previous sizeByReaders value | Sets the ports. sizeByReaders engine configuration setting. Returns the previous value of the ports.sizeByReaders setting. See Engine Configuration Settings for more information. |
sortBuffer | String: sortBuffer value (optional) | Previous sortBuffer value | Sets the sort.sortBuffer engine configuration setting. Returns the previous value of the sort.sortBuffer setting. This value is set using a text value to represent the size. Use "k", "m" and "g" suffixes to represent kilobytes, megabytes and gigabytes, respectively. See Engine Configuration Settings for more information. |
sortIOBuffer | String: sortIOBuffer value (optional) | Previous sortIOBuffer value | Sets the sort . sortIOBuffer engine configuration setting. Returns the previous value of the sort.sortIOBuffer setting. This value is set using a text value to represent the size. Use "k", "m" and "g" suffixes to represent kilobytes, megabytes and gigabytes, respectively. See Engine Configuration Settings for more information. |
spoolThreshold | int: spoolThreshold value (optional) | Previous spoolThreshold value | Sets the ports.spoolThreshold engine configuration setting. Returns the previous value of the ports.spoolThreshold setting. See Engine Configuration Settings for more information. |
storageManagementPath | String: storageManagementPath value | Previous storageManagementPath value | Sets the storageManagementPath engine configuration setting. Returns the previous value of the storageManagementPath setting. See Engine Configuration Settings for more information. |
writeAhead | int: writeAhead value (optional) | Previous writeAhead value | Sets the ports.writeAhead engine configuration setting. Returns the previous value of the ports.writeAhead setting. See Engine Configuration Settings for more information. |