Demystifying Classification Evaluation Metrics: Accuracy, Precision, Recall, and More
Classification Evaluation Measures:
- Accuracy: measures the proportion of correct predictions among all predictions made by the model.
- Precision: measures the proportion of true positive predictions among all positive predictions made by the model.
- Recall: measures the proportion of true positive predictions among all actual positive instances in the data.
- F1-score: the harmonic mean of precision and recall, provides a balanced measure between them.
- Confusion matrix: a table that shows the number of true positive, false positive, true negative, and false negative predictions made by the model.
Confusion Metrics
A confusion matrix is a table that summarizes the performance of a classification model by comparing the predicted labels with the actual labels. It consists of four terms: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).
True Positive (TP) — predicted positive and actually positive
False Positive (FP) — predicted positive but actually negative
True Negative (TN) — predicted negative and actually negative
False Negative (FN) — predicted negative but actually positive
The confusion matrix is typically represented as follows:
We can use the confusion matrix to calculate various evaluation metrics for a classification model, including:
→ Accuracy: The accuracy measures the proportion of correctly classified instances among all instances. It is calculated as follows:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
→ Precision: Precision measures the proportion of correctly classified positive instances among all predicted positive instances. It is calculated as follows:
Precision = TP / (TP + FP)
→ Recall (Sensitivity or True Positive Rate): Recall measures the proportion of correctly classified positive instances among all actual positive instances. It is calculated as follows:
Recall = TP / (TP + FN)
→ Specificity (True Negative Rate): Specificity measures the proportion of correctly classified negative instances among all actual negative instances. It is calculated as follows:
Specificity = TN / (TN + FP)
→ F1-Score: The F1-score is the harmonic mean of precision and recall. It provides a balanced measure between precision and recall. It is calculated as follows:
F1-Score = 2 * (Precision * Recall) / (Precision + Recall)
These evaluation metrics can be used to assess the performance of a classification model and make informed decisions based on the model’s performance.
Example of Confusion Metrics
here’s an example of a confusion matrix for a binary classification problem:
In this example, we have a binary classification problem where the positive class is labeled “Yes” and the negative class is labeled “No”. The confusion matrix shows the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for the model’s predictions.
Using these values, we can calculate the following performance metrics:
- Accuracy = (TP + TN) / (TP + FP + TN + FN) = (45 + 15) / (45 + 10 + 5 + 15) = 0.75
- Precision = TP / (TP + FP) = 45 / (45 + 10) = 0.82
- Recall (Sensitivity) = TP / (TP + FN) = 45 / (45 + 5) = 0.9
- Specificity = TN / (TN + FP) = 15 / (15 + 5) = 0.75
- F1-score = 2 * (precision * recall) / (precision + recall) = 2 * (0.82 * 0.9) / (0.82 + 0.9) = 0.86
Accuracy measures the proportion of correct predictions overall. In this case, the model has an accuracy of 0.75, meaning it correctly predicted 75% of the instances.
Precision measures the proportion of true positive predictions among all positive predictions made by the model. In this case, the model has a precision of 0.82, meaning 82% of the instances it predicted as positive were actually positive.
Recall measures the proportion of true positive predictions among all actual positive instances in the data. In this case, the model has a recall of 0.9, meaning it correctly identified 90% of the actual positive instances.
Specificity measures the proportion of true negative predictions among all actual negative instances in the data. In this case, the model has a specificity of 0.75, meaning it correctly identified 75% of the actual negative instances.
F1-score is the harmonic mean of precision and recall, providing a balanced measure between the two. In this case, the model has an F1-score of 0.86, indicating a relatively good balance between precision and recall.
Top Machine Learning Mastery: Elevate Your Skills with this Step-by-Step Tutorial
1. Need for Machine Learning, Basic Principles, Applications, Challenges
4. Logistic Regression (Binary Classification)
8. Gradient Boosting (XGboost)
11. Neural Network Representation (Perceptron Learning)
15. Dimensionality Reduction (PCA, SVD)
16. Clustering (K-Means Clustering, Hierarchical Clustering)
19. Reinforcement Learning Fundamentals and Applications
20. Q-Learning
Dive into an insightful Machine Learning tutorial for exam success and knowledge expansion. More concepts and hands-on projects coming soon — follow my Medium profile for updates!