Multiclass classification

Several SIMCA models can be combined to a special object simcam, which is used to make a multiclass classification. Besides this, it also allows calculating distance between individual models and a discrimination power — importance of variables to discriminate between any two classes. Let’s see how it works.

First we create three single-class SIMCA models with individual settings, such as number of optimal components and alpha.

m.set = simca(X.set, 'setosa', 3, alpha = 0.01)
m.set = selectCompNum(m.set, 1)

m.vir = simca(X.vir, 'virginica', 3)
m.vir = selectCompNum(m.vir, 2)

m.ver = simca(X.ver, 'versicolor', 3)
m.ver = selectCompNum(m.ver, 1)

Then we combine the models into a SIMCAM model object. Summary will show the performance on calibration set, which is a combination of calibration sets for each of the individual models

m = simcam(list(m.set, m.vir, m.ver))
summary(m)
## 
## SIMCA multiple classes classification (class simcam)
## Nmber of classes: 3
## Info: 
## 
## SIMCA model for class "setosa" summary
## 
## Info: 
## Method for critical limits: jm
## Significance level (alpha): 0.01
## Selected number of components: 1
## 
##        Expvar Cumexpvar Sens (cal)
## Comp 1  73.51     73.51          1
## Comp 2  14.24     87.76          1
## Comp 3  10.44     98.20          1
## 
## SIMCA model for class "virginica" summary
## 
## Info: 
## Method for critical limits: jm
## Significance level (alpha): 0.05
## Selected number of components: 2
## 
##        Expvar Cumexpvar Sens (cal)
## Comp 1  76.16     76.16       0.88
## Comp 2  14.94     91.10       1.00
## Comp 3   6.09     97.20       0.96
## 
## SIMCA model for class "versicolor" summary
## 
## Info: 
## Method for critical limits: jm
## Significance level (alpha): 0.05
## Selected number of components: 1
## 
##        Expvar Cumexpvar Sens (cal)
## Comp 1  76.44     76.44       0.96
## Comp 2  13.93     90.37       0.92
## Comp 3   8.45     98.82       0.92

Now we apply the combined model to the test set and look at the predictions.

res = predict(m, X.t, c.t)
plotPredictions(res)

In this case the predictions are shown only for the number of components each model found optimal. The names of classes along y-axis are the individual models. Similarly we can show the predicted values.

show(res$c.pred[20:30, 1, 1:3])
##    setosa virginica versicolor
## 40      1        -1         -1
## 42     -1        -1         -1
## 44      1        -1         -1
## 46      1        -1         -1
## 48      1        -1         -1
## 50      1        -1         -1
## 52     -1        -1          1
## 54     -1        -1          1
## 56     -1         1          1
## 58     -1        -1         -1
## 60     -1        -1          1

Method getConfusionMatrix() is also available in this case.

show(getConfusionMatrix(res))
##            setosa virginica versicolor None
## setosa         24         0          0    1
## versicolor      0         2         23    2
## virginica       0        21          5    4

There are three additional plots available for multiclass SIMCA model. First of all it is a distance between a selected model and the others.

par(mfrow = c(1, 2))
plotModelDistance(m, 1)
plotModelDistance(m, 2)

The second plot is a discrimination power, mentioned in the beginning of the section.

par(mfrow = c(1, 2))
plotDiscriminationPower(m, c(1, 3), show.labels = T)
plotDiscriminationPower(m, c(2, 3), show.labels = T)

And, finally, a Cooman’s plot showing an orthogonal distance from objects to two selected classes/models.

par(mfrow = c(1, 2))
plotCooman(m, c(1, 3), show.labels = T)
plotCooman(m, c(2, 3), show.labels = T)