Mastering Dimensionality Reduction: Exploring PCA and SVD Methods

8 min readMay 3, 2023

What is Dimensionality Reduction?

Dimensionality reduction is a technique used in machine learning and data analysis to reduce the number of input variables or features in a dataset. It involves transforming a high-dimensional dataset into a lower-dimensional space while preserving as much of the original information as possible.

There are mainly two types of dimensionality reduction techniques:

PCA (Principal Component Analysis)
SVD (Singular Value Decomposition)

What is Principal Component Analysis?

The Principal Component Analysis is a popular unsupervised learning technique for reducing the dimensionality of data. It increases interpretability yet, at the same time, it minimizes information loss. It helps to find the most significant features in a dataset and makes the data easy for plotting in 2D and 3D. PCA helps in finding a sequence of linear combinations of variables.

In the above figure, we have several points plotted on a 2-D plane. There are two principal components. PC1 is the primary principal component that explains the maximum variance in the data. PC2 is another principal component that is orthogonal to PC1.

What is a Principal Component?

The Principal Component are a straight line that captures most of the variance of the data. They have a direction and magnitude. Principal components are orthogonal projections (perpendicular) of data onto lower-dimensional space.

Now that you have understood the basics of PCA, let’s look at the next topic on PCA in Machine Learning.

Dimensionality

The term “dimensionality” describes the number of features or variables used in the research. It can be difficult to visualize and interpret the relationships between variables when dealing with high-dimensional data, such as datasets with numerous variables. While reducing the number of variables in the dataset, dimensionality reduction methods like PCA are used to preserve the most crucial data. The original variables are converted into a new set of variables called principal components, which are linear combinations of the original variables, by PCA in order to accomplish this.

The dataset’s reduced dimensionality depends on how many principal components are used in the study. The objective of PCA is to select fewer principal components that account for the data’s most important variation. PCA can help to streamline data analysis, enhance visualization, and make it simpler to spot trends and relationships between factors by reducing the dimensionality of the dataset.

Steps for PCA Algorithm

Here step by step workflow of Principal Component Analysis:

Standardize the data: PCA requires standardized data, so the first step is to standardize the data to ensure that all variables have a mean of 0 and a standard deviation of 1.
Calculate the covariance matrix: The next step is to calculate the covariance matrix of the standardized data. This matrix shows how each variable is related to every other variable in the dataset.
Calculate the eigenvectors and eigenvalues: The eigenvectors and eigenvalues of the covariance matrix are then calculated. The eigenvectors represent the directions in which the data varies the most, while the eigenvalues represent the amount of variation along each eigenvector.
Choose the principal components: The principal components are the eigenvectors with the highest eigenvalues. These components represent the directions in which the data varies the most and are used to transform the original data into a lower-dimensional space.
Transform the data: The final step is to transform the original data into the lower-dimensional space defined by the principal components.

How Does Principal Component Analysis Work?

1. Normalize the Data

Standardize the data before performing PCA. This will ensure that each feature has a mean = 0 and variance = 1.

2. Build the Covariance Matrix

Construct a square matrix to express the correlation between two or more features in a multidimensional dataset.

3. Find the Eigenvectors and Eigenvalues

Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are scalars by which we multiply the eigenvector of the covariance matrix.

4. Sort the Eigenvectors in the Highest to Lowest Order and Select the Number of Principal Components.

Now that you have understood How PCA in Machine Learning works.

Applications of PCA in Machine Learning

PCA is used to visualize multidimensional data.
It is used to reduce the number of dimensions in healthcare data.
PCA can help resize an image.
It can be used in finance to analyze stock data and forecast returns.
PCA helps to find patterns in high-dimensional datasets.

Advantages of PCA

Here are some advantages of principal component analysis:

Dimensionality reduction: By determining the most crucial features or components, PCA reduces the dimensionality of the data, which is one of its primary benefits. This can be helpful when the initial data contains a lot of variables and is therefore challenging to visualize or analyze.
Feature Extraction: PCA can also be used to derive new features or elements from the original data that might be more insightful or understandable than the original features. This is particularly helpful when the initial features are correlated or noisy.
Data visualization: By projecting the data onto the first few principal components, PCA can be used to visualize high-dimensional data in two or three dimensions. This can aid in locating data patterns or clusters that may not have been visible in the initial high-dimensional space.
Noise Reduction: By locating the underlying signal or pattern in the data, PCA can also be used to lessen the impacts of noise or measurement errors in the data.
Multicollinearity: When two or more variables are strongly correlated, there is multicollinearity in the data, which PCA can handle. PCA can lessen the impacts of multicollinearity on the analysis by identifying the most crucial features or components.

Disadvantages of PCA

Here are some disadvantages of principal component analysis:

Interpretability: Although principal component analysis (PCA) is effective at reducing the dimensionality of data and spotting patterns, the resulting principal components are not always simple to understand or describe in terms of the original features.
Information loss: PCA involves choosing a subset of the most crucial features or components in order to reduce the dimensionality of the data. While this can be helpful for streamlining the data and lowering noise, if crucial features are not included in the components chosen, information loss may also result.
Outliers: Because PCA is susceptible to anomalies in the data, the resulting principal components may be significantly impacted. The covariance matrix can be distorted by outliers, which can make it harder to identify the most crucial characteristics.
Scaling: PCA makes the assumption that the data is scaled and centralized, which can be a drawback in some circumstances. The resulting principal components might not correctly depict the underlying patterns in the data if the data is not scaled properly.
Computing complexity: For big datasets, it may be costly to compute the eigenvectors and eigenvalues of the covariance matrix. This may restrict PCA’s ability to scale and render it useless for some uses.

What is Singular Value Decomposition (SVD)?

Singular Value Decomposition (SVD) is a powerful mathematical tool that provides a factorization of a given matrix into three matrices: a unitary matrix, a diagonal matrix, and its conjugate transpose. It is a form of matrix factorization that can be used in various fields, including machine learning.

In machine learning, SVD is often used for dimensionality reduction, data compression, and denoising. The goal is to reduce the number of dimensions in the data set while preserving as much information as possible. By reducing the dimensionality of the data, SVD helps to mitigate the curse of dimensionality, which is a common problem in machine learning, where the performance of algorithms decreases as the number of features in the data increases.

How Does SVD Work?

SVD works by finding the orthogonal axes that best capture the variations in the data. The orthogonal axes, called singular vectors, correspond to the principal components of the data. The singular values in the diagonal matrix are the magnitude of these components. By selecting only the top k singular values and vectors, we can reduce the dimensionality of the data to k dimensions.

Advantages of SVD

Here are some advantages of singular value decomposition:

Dimensionality reduction: SVD can reduce the number of dimensions in high-dimensional data, making it easier to visualize and analyze.
Applications in various fields: SVD has various applications in fields like image processing, natural language processing, and recommendation systems.
Interpretability: The singular values and singular vectors obtained from SVD can provide insight into the structure of the data and the relationship between features.

Disadvantages of SVD

Here are some disadvantages of singular value decomposition:

Limitations in non-linear relationships: SVD assumes that the relationships between features are linear, which can limit its ability to capture complex non-linear relationships.
SVD is sensitive to the scale of the features.

What’s Next?

This blog will help you in understanding theory concepts. I’m currently implementing mathematical calculations for use on various PCA and SVD concepts.

Top Machine Learning Mastery: Elevate Your Skills with this Step-by-Step Tutorial
1. Need for Machine Learning, Basic Principles, Applications, Challenges
2. Types of Machine Learning
3. Linear Regression
4. Logistic Regression (Binary Classification)
5. K-Nearest Neighbors
6. Decision Tree
7. Random Forest
8. Gradient Boosting (XGboost)
9. Support Vector Machines
10. Classification Evaluation measures (Accuracy, Precision, Recall, confusion Metrics) Overfitting and underfitting
11. Neural Network Representation (Perceptron Learning)
12. Convolution Neural Nets
13. Recurrent Neural Nets
14. Hyperparameter tuning
15. Dimensionality Reduction (PCA, SVD)
16. Clustering (K-Means Clustering, Hierarchical Clustering)
17. Anomaly Detection
18. Association Rule Learning
19. Reinforcement Learning Fundamentals and Applications
20. Q-Learning
21. Recommendation Systems
Dive into an insightful Machine Learning tutorial for exam success and knowledge expansion. More concepts and hands-on projects coming soon — follow my Medium profile for updates!