Was this helpful?
Naive Bayes Operators
The Naive Bayes algorithm uses Bayes’ theorem to find the probability of an event occurring given the probability of another event that has already occurred.
The Naive Bayes learner produces a model that can be used with the Naive Bayes predictor as a simple probabilistic classifier based on applying Bayes’ theorem with strong, or naive, independence assumptions.
One of the advantages of a naive Bayes classifier is that it only requires a relatively small amount of training data to estimate the parameters necessary for classification.
DataFlow provides operators to produce and use naive Bayes classifiers. The learner is used to determine the classification rules for a particular data set while the predictor can apply these rules to a data set. For more information, refer to the following topics:
NaiveBayesLearner Operator
The NaiveBayesLearner operator is responsible for building a Naive Bayes PMML model from input data. The base algorithm used is specified at http://www.dmg.org/v4-0-1/NaiveBayes.html, with the following differences:
Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
We use Laplace smoothing in place of the "threshold" parameter.
We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ratio of missing to non-missing.
Calculation is performed in terms of log-likelihood rather than likelihood to avoid underflow on data with a large number of fields.
Code Example
This example uses Naive Bayes to create a predictive classification model based on the Iris data set. It uses the field "class" within the iris data as the target column. This example produces a PMML model that is persisted. This PMML model can then be used with the NaiveBayesPredictor operator to predict target values.
Using the NaiveBayesLearner operator in Java
// Run Naive Bayes using "class" as the target column.
// All other columns are used as learning columns by default.
NaiveBayesLearner nbLearner = graph.add(new NaiveBayesLearner());
nbLearner.setTargetColumn("class");
Using the NaiveBayesLearner operator in RushScript
// Run Naive Bayes using "class" as the target column.
// All other columns are used as learning columns by default.
var model = dr.naiveBayesLearner(data, {targetColumn:'class'});
Properties
The NaiveBayesLearner operator provides the following properties.
Name
Type
Description
learningColumns
List<String>
The list of columns to be used to predict the output value. Default of empty list means "everything but targetColumn".
targetColumn
String
The name of the column to be predicted. Must be a column of type string.
Ports
The NaiveBayesLearner operator provides a single input port.
Name
Type
Get Method
Description
input
RecordPort
getInput()
The input data. String fields are assumed to be categorical. Double fields are assumed to be numerical. All other fields are ignored.
The NaiveBayesLearner operator provides a single output port.
Name
Type
Get Method
Description
model
getModel()
The Naive Bayes PMML model.
NaiveBayesPredictor Operator
The NaiveBayesPredictor operator applies a previously built Naive Bayes model to the input data. The base algorithm used is specified at http://www.dmg.org/v4-0-1/NaiveBayes.html, with the following differences:
Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
We use Laplace smoothing in place of the "threshold" parameter.
We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ratio of missing to non-missing.
Calculation is performed in terms of log-likelihood rather than likelihood.
Code Example
Using the NaiveBayesPredictor operator in Java
// Create the Naive Bayes predictor operator and add it to a graph
NaiveBayesPredictor predictor = graph.add(new NaiveBayesPredictor());
predictor.setAppendProbabilities(false);

// Connect the predictor to an input data and model source
graph.connect(dataSource.getOuptut(), predictor.getInput());
graph.connect(modelSource.getOutput(), predictor.getModel());

// The output of the predictor is available for downstream operators to use
Using the NaiveBayesPredictor operator in RushScript
// Apply a naive Bayes model to the given data
var classifiedData = dr.naiveBayesPredictor(model, data, {appendProbabilities:false});
Properties
The NaiveBayesPredictor operator provides the following properties.
Name
Type
Description
appendProbabilities
boolean
Whether to include probabilities in the prediction. Default: true.
laplaceCorrector
double
The Laplace corrector to be used. The Laplace corrector is a way to handle "zero" counts in the training data. Otherwise a value that was never observed in the training data results in zero probability. The default of 0.0 means no correction. The "threshold" value specified in the PMML model will always be ignored in favor of the Laplace corrector specified on a NaiveBayesPredictor.
ignoreMissingValues
boolean
Whether to ignore missing values. If set to true, missing values are ignored for the purposes of prediction; otherwise missing values are considered when calculating probability distribution. Default: true.
probabilityPrefix
String
The field name prefix to use for probabilities. Default: "probability_"
winnerField
String
The name of the winner field to output. Default: "winner"
Ports
The NaiveBayesPredictor operator provides the following input ports.
Name
Type
Get Method
Description
input
getInput()
The input data to which the Naive Bayes model is applied.
model
getModel()
Naive Bayes model in PMML to apply.
The NaiveBayesPredictor operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
Results of applying the model to the input data.
Last modified date: 03/10/2025