# Partial least squares regression

Partial least squares regression (PLS) is a linear regression method, which uses principles similar to PCA: data is decomposed using latent variables. Because in this case we have two datasets, matrix with predictors ($$\mathbf{X}$$) and matrix with responses ($$\mathbf{Y}$$) we do decomposition for both, computing scores, loadings and residuals: $$\mathbf{X} = \mathbf{TP}^\mathrm{T} + \mathbf{E}_x$$, $$\mathbf{Y} = \mathbf{UQ}^\mathrm{T} + \mathbf{E}_y$$. In addition to that, orientation of latent variables in PLS is selected to maximize the covariance between the X-scores, $$\mathbf{T}$$, and Y-scores $$\mathbf{U}$$. This approach makes possible to work with datasets where more traditional Multiple Linear Regression fails — when number of variables exceeds number of observations and when X-variables are mutually correlated. But, at the end, PLS-model is a linear model, where response value is just a linear combination of predictors, so the main outcome is a vector with regression coefficients.

There are two main algorithms for PLS, NIPALS and SIMPLS, in the mdatools only the last one is implemented. PLS model and PLS results objects have a lot of properties and performance statistics, which can be visualized via plots. Besides that, there is also a possibility to compute selectivity ratio (SR) and VIP scores, which can be used for selection of most important variables. Another additional option is a randomization test which helps to select optimal number of components. We will discuss most of the methods in this chapter and you can get the full list using ?pls.