#### Principle component analysis

PCA is a well-known method of dimension reduction [

17]. The basic idea of PCA is to reduce the dimensionality of a data set, while retaining as much as possible the variation present in the original predictor variables. This is achieved by transforming the

*p* original variables

*X* = [

x
_{
1
},

x
_{
2
},...,

x
_{
p
}] to a new set of

*q* predictor variables,

*T* = [

t
_{
1
},

t
_{
2
},...,

t
_{
q
}], which are linear combinations of the original variables. In mathematical terms, PCA sequentially maximizes the variance of a linear combination of the original predictor variables,

subject to the constraint
, for all 1 ≤ *i* <*j*. The orthogonal constraint ensures that the linear combinations are uncorrelated, *i.e*. Cov(*X*
**u**
_{
i
}, *X*
**u**
_{
j
}) = 0, *i* ≠ *j*. These linear combinations

*t*
_{
i
} = *X*
**u**
_{
i
} (2)

are known as the principal components (PCs). Geometrically, these linear combinations represent the selection of a new coordinate system obtained by rotating the original system. The new axes represent the directions with maximum variability and are ordered in terms of the amount of variation of the original data they account for. The first PC accounts for as much of the variability as possible, and each succeeding component accounts for as much of the remaining variability as possible. Computation of the principal components reduces to the solution of an eigenvalue-eigenvector problem. The projection vectors (or called the weighting vectors) u can be obtained by eigenvalue decomposition on the covariance matrix *S*
_{
X
},

*S*
_{
X
}
**u**
_{
i
} = *λ*
_{
i
}
**u**
_{
i
} (3)

where *λ*
_{
i
} is the *i*-th eigenvalue in the descending order for *i* = 1,..., *q*, and **u**
_{
i
} is the corresponding eigenvector. The eigenvalue *λ*
_{
i
} measures the variance of the *i*-th PC and the eigenvector **u**
_{
i
} provides the weights (loadings) for the linear transformation (projection).

The maximum number of components *q* is determined by the number of nonzero eigenvalues, which is the rank of *S*
_{
X
}, and *q* ≤ min(*n*, *p*). But in practical, the maximum value of *q* is not necessary. Some tail components, which have tiny eigenvalues and represent few variances of original data, are often needed to be reduced. The threshold of *q* often determined by cross-validation or the proportion of explained variances [17]. The computational cost of PCA, determined by the number of original predictor variables *p* and the number of samples *n*, is in the order of min(*np*
^{2} + *p*
^{3}, *pn*
^{2} + *n*
^{3}). In other words, the cost is *O*(*pn*
^{2}+ *n*
^{3}) when *p* >*n*.

#### Partial least squares based dimension reduction

Partial Least Squares (PLS) was firstly developed as an algorithm performing matrix decompositions, and then was introduced as a multivariate regression tool in the context of chemometrics [18, 19]. In recent years, PLS has also been found to be an effective dimension reduction technique for tumor discrimination [11, 12, 20], which denoted as Partial Least Squares based Dimension Reduction (PLSDR).

The underlying assumption of PLS is that the observed data is generated by a system or process which is driven by a small number of latent (not directly observed or measured) features. Therefore, PLS aims at finding uncorrelated linear transformations (latent components) of the original predictor features which have high covariance with the response features. Based on these latent components, PLS predicts response features **y**, the task of regression, and reconstruct original matrix *X*, the task of data modeling, at the same time.

The objective of constructing components in PLS is to maximize the covariance between the response variable

**y** and the original predictor variables

*X*,

subject to the constraint
, for all 1 ≤ *i* <*j*. The central task of PLS is to obtain the vectors of optimal weights **w**
_{
i
} (*i* = 1,..., *q*) to form a small number of components, while PCA is an "unsupervised" method that utilizes the *X* data only.

To derive the components, [

t
_{
1
},

t
_{
2
},...,

t
_{
q
}], PLS decomposes

*X* and

**y** to produce a bilinear representation of the data [

21]:

where w's are vectors of weights for constructing the PLS components t = *Xw*, v's are scalars, and e and f are the residuals. The idea of PLS is to estimate w and v by regression. Specifically, PLS fits a sequence of bilinear models by least squares, thus given the name partial least squares [18].

At each step *i* (*i* = 1,..., *q*), the vector w
_{
i
} is estimated in such a way that the PLS component, t
_{
i
}, has maximal sample covariance with the response variable **y** subject to being uncorrelated with all previously constructed components.

The first PLS component t
_{1} is obtained based on the covariance between *X* and **y**. Each subsequent component *t*
_{
i
} (*i* = 2,..., *q*), is computed using the residuals of *X* and **y** from the previous step, which account for the variations left by the previous components. As a result, the PLS components are uncorrelated and ordered.

The number of components *q* is the only parameter of PLS which can be decided by user [11, 12], by cross-validation [13] or by the regression goodness-of-fit [22]. With the increase of *q*, the explained variances of *X* and **y** are expanded, and all the information of original data are preserved when *q* reaches the rank of *X*, which is the maximal value of *q*.

Like PCA, PLS reduces the complexity of microarray data analysis by constructing a small number of gene components, which can be used to replace the large number of original gene expression measures. Moreover, obtained by maximizing the covariance between the components and the response variable, the PLS components are generally more predictive of the response variable than the principal components.

PLS is computationally efficient with cost only at *O*(*npq*), *i.e*. the number of calculations required by PLS is a linear function of *n* or *p*. Thus it is much faster than the method of PCA for *q* is always less than *n*.