Models and results

Like we discussed for PCA, matools creates two types of objects — a model and a result. Every time you build a PLS model you get a model object. Every time you apply the model to a dataset you get a result object. For PLS, the objects have classes pls and plsres correspondingly.

Model calibration

Let’s use the same People data and create a PLS-model for prediction of Shoesize (column number four) using other 11 variables as predictors. As usual, we start with preparing datasets (we will also split the data into calibration and test subsets):

library(mdatools)
data(people)

idx = seq(4, 32, 4)
Xc = people[-idx, -4]
yc = people[-idx, 4, drop = FALSE]
Xt = people[idx, -4]
yt = people[idx, 4, drop = FALSE]

So Xc and yc are predictors and response values for calibration subset. Now let’s calibrate the model and show an information about the model object:

m = pls(Xc, yc, 7, scale = TRUE, info = "Shoesize prediction model")

## Warning in selectCompNum.pls(model, selcrit = ncomp.selcrit): No validation results were found.

You can notice that the calibration succeeded but there is also a warning about lack of validation results. For supervised models, which have complexity parameter (in this case — number of components), doing proper validation is important as it helps to find the optimal complexity. When you calibrate PLS model the calibration also tries to find the optimal number (details will be discussed later in this chapter) and this needs some validation. How to do proper validation of PLS models is discussed in the next section.

Here is an info for the model object:

print(m)

## 
## PLS model (class pls)
## 
## Call:
## selectCompNum.pls(obj = model, selcrit = ncomp.selcrit)
## 
## Major fields:
## $ncomp - number of calculated components
## $ncomp.selected - number of selected components
## $coeffs - object (regcoeffs) with regression coefficients
## $xloadings - vector with x loadings
## $yloadings - vector with y loadings
## $weights - vector with weights
## $res - list with results (calibration, cv, etc)
## 
## Try summary(model) and plot(model) to see the model performance.

As expected, we see loadings for predictors and responses, matrix with weights, and a special object (regcoeffs) for regression coefficients.

Result object

Similar to PCA, model object contains list with result objects (res), obtained using calibration set (cal), cross-validation (cv) and test set validation (test). All three have class plsres, here is how res$cal looks like:

print(m$res$cal)

## 
## PLS results (class plsres)
## 
## Call:
## plsres(y.pred = yp, y.ref = y.ref, ncomp.selected = object$ncomp.selected, 
##     xdecomp = xdecomp, ydecomp = ydecomp)
## 
## Major fields:
## $ncomp.selected - number of selected components
## $y.pred - array with predicted y values
## $y.ref - matrix with reference y values
## $rmse - root mean squared error
## $r2 - coefficient of determination
## $slope - slope for predicted vs. measured values
## $bias - bias for prediction vs. measured values
## $ydecomp - decomposition of y values (ldecomp object)
## $xdecomp - decomposition of x values (ldecomp object)

The xdecomp and ydecomp are objects similar to pcares, they contain scores, residuals and variances for decomposition of X and Y correspondingly.

print(m$res$cal$xdecomp)

## 
## Results of data decomposition (class ldecomp).
## 
## Major fields:
## $scores - matrix with score values
## $T2 - matrix with T2 distances
## $Q - matrix with Q residuals
## $ncomp.selected - selected number of components
## $expvar - explained variance for each component
## $cumexpvar - cumulative explained variance

Other fields are mostly various performance statistics, including slope, coefficient of determination (R²), bias, and root mean squared error (RMSE). Besides that, the results also include reference y-values and array with predicted y-values. The array has dimension nObjects x nComponents x nResponses.

PLS predictions for a new set can be obtained using method predict:

res = predict(m, Xt, yt)
print(res)

## 
## PLS results (class plsres)
## 
## Call:
## plsres(y.pred = yp, y.ref = y.ref, ncomp.selected = object$ncomp.selected, 
##     xdecomp = xdecomp, ydecomp = ydecomp)
## 
## Major fields:
## $ncomp.selected - number of selected components
## $y.pred - array with predicted y values
## $y.ref - matrix with reference y values
## $rmse - root mean squared error
## $r2 - coefficient of determination
## $slope - slope for predicted vs. measured values
## $bias - bias for prediction vs. measured values
## $ydecomp - decomposition of y values (ldecomp object)
## $xdecomp - decomposition of x values (ldecomp object)

If reference y-values are not provided to predict() function, then all predictions are computed anyway, but performance statistics (and corresponding plot) will be not be available.