Naive Bayes Operators
The Naive Bayes algorithm uses Bayes’ theorem to find the probability of an event occurring given the probability of another event that has already occurred.
The Naive Bayes learner produces a model that can be used with the Naive Bayes predictor as a simple probabilistic classifier based on applying Bayes’ theorem with strong, or naive, independence assumptions.
One of the advantages of a naive Bayes classifier is that it only requires a relatively small amount of training data to estimate the parameters necessary for classification.
DataFlow provides operators to produce and use naive Bayes classifiers. The learner is used to determine the classification rules for a particular data set while the predictor can apply these rules to a data set. For more information, refer to the following topics:
NaiveBayesLearner Operator
The
NaiveBayesLearner operator is responsible for building a Naive Bayes PMML model from input data. The base algorithm used is specified at
http://www.dmg.org/v4-0-1/NaiveBayes.html, with the following differences:
• Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
• We use Laplace smoothing in place of the "threshold" parameter.
• We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ratio of missing to non-missing.
• Calculation is performed in terms of log-likelihood rather than likelihood to avoid underflow on data with a large number of fields.
Code Example
This example uses Naive Bayes to create a predictive classification model based on the Iris data set. It uses the field "class" within the iris data as the target column. This example produces a PMML model that is persisted. This PMML model can then be used with the
NaiveBayesPredictor operator to predict target values.
Using the NaiveBayesLearner operator in Java
// Run Naive Bayes using "class" as the target column.
// All other columns are used as learning columns by default.
NaiveBayesLearner nbLearner = graph.add(new NaiveBayesLearner());
nbLearner.setTargetColumn("class");
Using the NaiveBayesLearner operator in RushScript
// Run Naive Bayes using "class" as the target column.
// All other columns are used as learning columns by default.
var model = dr.naiveBayesLearner(data, {targetColumn:'class'});
Properties
The
NaiveBayesLearner operator provides the following properties.
Ports
The
NaiveBayesLearner operator provides a single input port.
The
NaiveBayesLearner operator provides a single output port.
NaiveBayesPredictor Operator
The
NaiveBayesPredictor operator applies a previously built Naive Bayes model to the input data. The base algorithm used is specified at
http://www.dmg.org/v4-0-1/NaiveBayes.html, with the following differences:
• Provides the ability to predict based on numerical data. For numerical data, we compute probability based on the assumption of a Gaussian distribution.
• We use Laplace smoothing in place of the "threshold" parameter.
• We provide an option to count missing values. If selected, missing values are treated like any other single distinct value. Probability is calculated in terms of the ratio of missing to non-missing.
• Calculation is performed in terms of log-likelihood rather than likelihood.
Code Example
Using the NaiveBayesPredictor operator in Java
// Create the Naive Bayes predictor operator and add it to a graph
NaiveBayesPredictor predictor = graph.add(new NaiveBayesPredictor());
predictor.setAppendProbabilities(false);
// Connect the predictor to an input data and model source
graph.connect(dataSource.getOuptut(), predictor.getInput());
graph.connect(modelSource.getOutput(), predictor.getModel());
// The output of the predictor is available for downstream operators to use
Using the NaiveBayesPredictor operator in RushScript
// Apply a naive Bayes model to the given data
var classifiedData = dr.naiveBayesPredictor(model, data, {appendProbabilities:false});
Properties
The
NaiveBayesPredictor operator provides the following properties.
Ports
The
NaiveBayesPredictor operator provides the following input ports.
The
NaiveBayesPredictor operator provides a single output port.
Last modified date: 03/10/2025