Demystifying Classification Evaluation Metrics: Accuracy, Precision, Recall, and More

4 min readApr 10, 2023

Classification Evaluation Measures:

Accuracy: measures the proportion of correct predictions among all predictions made by the model.
Precision: measures the proportion of true positive predictions among all positive predictions made by the model.
Recall: measures the proportion of true positive predictions among all actual positive instances in the data.
F1-score: the harmonic mean of precision and recall, provides a balanced measure between them.
Confusion matrix: a table that shows the number of true positive, false positive, true negative, and false negative predictions made by the model.

Confusion Metrics

A confusion matrix is a table that summarizes the performance of a classification model by comparing the predicted labels with the actual labels. It consists of four terms: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).

True Positive (TP) — predicted positive and actually positive

False Positive (FP) — predicted positive but actually negative

True Negative (TN) — predicted negative and actually negative

False Negative (FN) — predicted negative but actually positive

The confusion matrix is typically represented as follows:

We can use the confusion matrix to calculate various evaluation metrics for a classification model, including:

→ Accuracy: The accuracy measures the proportion of correctly classified instances among all instances. It is calculated as follows:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

→ Precision: Precision measures the proportion of correctly classified positive instances among all predicted positive instances. It is calculated as follows:

Precision = TP / (TP + FP)

→ Recall (Sensitivity or True Positive Rate): Recall measures the proportion of correctly classified positive instances among all actual positive instances. It is calculated as follows:

Recall = TP / (TP + FN)

→ Specificity (True Negative Rate): Specificity measures the proportion of correctly classified negative instances among all actual negative instances. It is calculated as follows:

Specificity = TN / (TN + FP)

→ F1-Score: The F1-score is the harmonic mean of precision and recall. It provides a balanced measure between precision and recall. It is calculated as follows:

F1-Score = 2 * (Precision * Recall) / (Precision + Recall)

These evaluation metrics can be used to assess the performance of a classification model and make informed decisions based on the model’s performance.

Example of Confusion Metrics

here’s an example of a confusion matrix for a binary classification problem:

In this example, we have a binary classification problem where the positive class is labeled “Yes” and the negative class is labeled “No”. The confusion matrix shows the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for the model’s predictions.

Using these values, we can calculate the following performance metrics:

Accuracy = (TP + TN) / (TP + FP + TN + FN) = (45 + 15) / (45 + 10 + 5 + 15) = 0.75
Precision = TP / (TP + FP) = 45 / (45 + 10) = 0.82
Recall (Sensitivity) = TP / (TP + FN) = 45 / (45 + 5) = 0.9
Specificity = TN / (TN + FP) = 15 / (15 + 5) = 0.75
F1-score = 2 * (precision * recall) / (precision + recall) = 2 * (0.82 * 0.9) / (0.82 + 0.9) = 0.86

Accuracy measures the proportion of correct predictions overall. In this case, the model has an accuracy of 0.75, meaning it correctly predicted 75% of the instances.

Precision measures the proportion of true positive predictions among all positive predictions made by the model. In this case, the model has a precision of 0.82, meaning 82% of the instances it predicted as positive were actually positive.

Recall measures the proportion of true positive predictions among all actual positive instances in the data. In this case, the model has a recall of 0.9, meaning it correctly identified 90% of the actual positive instances.

Specificity measures the proportion of true negative predictions among all actual negative instances in the data. In this case, the model has a specificity of 0.75, meaning it correctly identified 75% of the actual negative instances.

F1-score is the harmonic mean of precision and recall, providing a balanced measure between the two. In this case, the model has an F1-score of 0.86, indicating a relatively good balance between precision and recall.

Top Machine Learning Mastery: Elevate Your Skills with this Step-by-Step Tutorial
1. Need for Machine Learning, Basic Principles, Applications, Challenges
2. Types of Machine Learning
3. Linear Regression
4. Logistic Regression (Binary Classification)
5. K-Nearest Neighbors
6. Decision Tree
7. Random Forest
8. Gradient Boosting (XGboost)
9. Support Vector Machines
10. Classification Evaluation measures (Accuracy, Precision, Recall, confusion Metrics) Overfitting and underfitting
11. Neural Network Representation (Perceptron Learning)
12. Convolution Neural Nets
13. Recurrent Neural Nets
14. Hyperparameter tuning
15. Dimensionality Reduction (PCA, SVD)
16. Clustering (K-Means Clustering, Hierarchical Clustering)
17. Anomaly Detection
18. Association Rule Learning
19. Reinforcement Learning Fundamentals and Applications
20. Q-Learning
21. Recommendation Systems
Dive into an insightful Machine Learning tutorial for exam success and knowledge expansion. More concepts and hands-on projects coming soon — follow my Medium profile for updates!

Demystifying Classification Evaluation Metrics: Accuracy, Precision, Recall, and More

Confusion Metrics

Example of Confusion Metrics

Written by Utsav Desai

No responses yet