Partial least squares regression

Partial least squares regression is a linear regression method, which uses principles similar to PCA: data is decomposed using latent variables. Because in this case we have two datasets, predictors (\(X\)) and responses (\(Y\)) we do decomposition for both, computing scores, loadings and residuals: \(X = TP^T + E_x\), \(Y = UQ^T + E_y\). In addition to that, orientation of latent variables in PLS is selected to maximize the covariance between the X-scores, \(T\), and Y-scores \(U\). This approach makes possible to work with datasets where more traditional Multiple Linear Regression fails — when number of variables exceeds number of observations and when X-variables are mutually correlated. But at the end PLS-model is a linear model, where response value is a linear combination of predictors, so the main outcome is a vector with regression coefficients.

There are two main algorithms for PLS, NIPALS and SIMPLS, in the mdatools only the last one is implemented. PLS model and PLS results objects have a lot of components and performance statistics, which can be visualised via plots. Besides that the implemented pls() method calculates selectivity ratio and VIP scores, which can be used for selection of most important variables. We will discuss most of the methods in this chapter and you can get the full list using ?pls.