Predictions for a new data
Again very similar to PLS — just use method predict()
and provide at least matrix or data frame with predictors (which should contain the same number of variables/columns). For test set validation you can also provide class reference information similar to what you have used for calibration of PLS-DA models.
In case of multiple class model, the reference values should be provided as a factor or vector with class names as text values. Here is an example.
##
## PLS-DA results (class plsdares) summary:
## Number of selected components: 1
##
## Class #1 (setosa):
## X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1 92.924 92.924 42.703 42.703 25 1 49 0 0.98 1 0.987
## Comp 2 4.560 97.484 11.216 53.920 25 0 50 0 1.00 1 1.000
## Comp 3 1.790 99.274 1.717 55.637 25 0 50 0 1.00 1 1.000
##
##
## Class #2 (versicolor):
## X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1 92.924 92.924 42.703 42.703 0 0 50 25 1.00 0.0 0.667
## Comp 2 4.560 97.484 11.216 53.920 10 4 46 15 0.92 0.4 0.747
## Comp 3 1.790 99.274 1.717 55.637 10 6 44 15 0.88 0.4 0.720
##
##
## Class #3 (virginica):
## X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1 92.924 92.924 42.703 42.703 25 4 46 0 0.92 1.00 0.947
## Comp 2 4.560 97.484 11.216 53.920 25 4 46 0 0.92 1.00 0.947
## Comp 3 1.790 99.274 1.717 55.637 24 4 46 1 0.92 0.96 0.933
And the corresponding plot with predictions.
If vector with reference class values contains names of classes model knows nothing about, they will simply be considered as members of non of the known clases (“None”).
In case of one-class model, the reference values can be either factor/vector with names or logical values, like the ones used for calibration of the model. Here is an example for each of the cases.
##
## PLS-DA results (class plsdares) summary:
## Number of selected components: 3
##
## Class #1 (virginica):
## X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1 93.107 93.107 54.394 54.394 25 4 46 0 0.92 1.00 0.947
## Comp 2 1.588 94.695 6.039 60.433 24 4 46 1 0.92 0.96 0.933
## Comp 3 2.641 97.336 -0.149 60.284 22 4 46 3 0.92 0.88 0.907
##
## PLS-DA results (class plsdares) summary:
## Number of selected components: 3
##
## Class #1 (virginica):
## X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1 93.107 93.107 54.394 54.394 25 4 46 0 0.92 1.00 0.947
## Comp 2 1.588 94.695 6.039 60.433 24 4 46 1 0.92 0.96 0.933
## Comp 3 2.641 97.336 -0.149 60.284 22 4 46 3 0.92 0.88 0.907
As you can see, statistically results are identical. However, predictions plot will look a bit different for these two cases, as you can see below.
And because predict()
returns an object with results you can also use most of the plots available for PLS regression results. In the last example below you will find plots for X-distance and Y-variance.