Was this helpful?
KNNClassifier Operator
KNNClassifier Operator
The KNNClassifier operator applies the k-nearest neighbor algorithm to classify input data against an already classified set of example data. A naive implementation is used, with each input record being compared against all example records to find the set of example records closest to it, as measured by a user-specified measure. The input record is then classified according to a user-specified method of combining the classes of the neighbors.
The field containing the classification value (also referred to as the target feature) must be specified. It is not necessary to specify the fields used to calculate nearness (also referred to as the selected features). If omitted, they will be derived from the example and query schema using all eligible fields. The example and query records need not have the same schema. All that is required is that:
The selected features must be present in both the example and query records and be of a numeric type (representing continuous data). In this context, a numeric type is any type that can be widened to a TokenTypeConstant.DOUBLE .
The target feature must be present in the example records and be either numeric (as described above) or an enumerated type (representing categorical data; TokenTypeConstant.ENUM(List)).
The output consists of the query data with the resulting classification appended to it. This value is in the field named "PREDICTED_VAL".
The implementation is designed to minimize memory usage. It is possible to specify an approximate limit on the amount of memory used by the operator; it is not necessary to have sufficient memory to hold both the example and query data in memory, although performance is best in this case.
Code Example
This example uses the Iris data to classify the type of iris based on its physical characteristics.
Using the KNNClassifier operator in Java
// Initialize the KNN classifier
KNNClassifier classifier = graph.add(new KNNClassifier(10, "class"));
List<String> features = new ArrayList<String>();
features.add("sepal length");
features.add("petal length");
classifier.setSelectedFeatures(features);
classifier.setClassificationScheme(ClassificationScheme.vote());
classifier.setNearnessMeasure(NearnessMeasure.euclidean());
Using the KNNClassifier operator in RushScript
// Use the KNN classifier
var results = dr.knnClassifier(data, queryData, {
    k:10, 
    targetFeature:"class", 
    selectedFeatures:['sepal length', 'petal length'],
    classificationScheme:ClassificationScheme.vote(),
    nearnessMeasure:NearnessMeasure.euclidean()});
Properties
The KNNClassifier operator provides the following properties.
Name
Type
Description
scheme
How to determine the classification of a record in the query data from the classifications of its nearest neighbors in the example data.
k
int
The size of the nearest neighbor set. The algorithm will use this many neighbors to perform classification of query data.
measure
How to determine the nearest neighbors of a record in the query data.
targetFeature
String
The field in the example data that provides classification data.
selectedFeatures
String...
The fields to use when determining the nearest neighbors. These fields must be present in both the example and query records. They must also be numeric.
size
long
The amount of memory to use for buffering the example data. This value is in bytes.
sizeSpecifier
String
Alternate setting to specify the amount of memory to use for buffering the example data. The amount is specified using standard multipliers such as K, M, and G.
Ports
The KNNClassifier operator provides the following input ports.
Name
Type
Get Method
Description
query
getQuery()
The query data that will be used.
example
getTraining()
The example training data that will be used.
The KNNClassifier operator provides a single output port.
Name
Type
Get Method
Description
output
getOutput()
The query data with an additional classification field.
Last modified date: 03/10/2025