For instance, the following shows 16 sample portraits (centered around 0.0) from the Olivetti dataset.
Sparse principal components yields a more parsimonious, interpretable representation, clearly emphasizing which of the original features contribute to the differences between samples.
The following example illustrates 16 components extracted using sparse PCA from the Olivetti faces dataset.
It can be seen how the regularization term induces many zeros.
parameter make it possible to project the data onto the singular space while scaling each component to unit variance.
This is often useful if the models down-stream make strong assumptions on the isotropy of the signal: this is for example the case for Support Vector Machines with the RBF kernel and the K-Means clustering algorithm.
Below is an example of the iris dataset, which is comprised of 4 features, projected on the 2 dimensions that explain most variance: It is often interesting to project data to a lower-dimensional space that preserves most of the variance, by dropping the singular vector of components associated with lower singular values.For instance, if we work with 64x64 pixel gray-level pictures for face recognition, the dimensionality of the data is 4096 and it is slow to train an RBF support vector machine on such wide data.Furthermore we know that the intrinsic dimensionality of the data is much lower than 4096 since all pictures of human faces look somewhat alike.The samples lie on a manifold of much lower dimension (say around 200 for instance).The PCA algorithm can be used to linearly transform the data while both reducing the dimensionality and preserve most of the explained variance at the same time.The class is very useful in that case: since we are going to drop most of the singular vectors it is much more efficient to limit the computation to an approximated estimate of the singular vectors we will keep to actually perform the transform.