Demystifying Classification Evaluation Metrics: Accuracy, Precision, Recall, and More

Utsav Desai
4 min readApr 10, 2023

--

Classification Evaluation Measures:

  • Accuracy: measures the proportion of correct predictions among all predictions made by the model.
  • Precision: measures the proportion of true positive predictions among all positive predictions made by the model.
  • Recall: measures the proportion of true positive predictions among all actual positive instances in the data.
  • F1-score: the harmonic mean of precision and recall, provides a balanced measure between them.
  • Confusion matrix: a table that shows the number of true positive, false positive, true negative, and false negative predictions made by the model.

Confusion Metrics

A confusion matrix is a table that summarizes the performance of a classification model by comparing the predicted labels with the actual labels. It consists of four terms: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).

True Positive (TP) — predicted positive and actually positive

False Positive (FP) — predicted positive but actually negative

True Negative (TN) — predicted negative and actually negative

False Negative (FN) — predicted negative but actually positive

The confusion matrix is typically represented as follows:

We can use the confusion matrix to calculate various evaluation metrics for a classification model, including:

→ Accuracy: The accuracy measures the proportion of correctly classified instances among all instances. It is calculated as follows:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

→ Precision: Precision measures the proportion of correctly classified positive instances among all predicted positive instances. It is calculated as follows:

Precision = TP / (TP + FP)

→ Recall (Sensitivity or True Positive Rate): Recall measures the proportion of correctly classified positive instances among all actual positive instances. It is calculated as follows:

Recall = TP / (TP + FN)

→ Specificity (True Negative Rate): Specificity measures the proportion of correctly classified negative instances among all actual negative instances. It is calculated as follows:

Specificity = TN / (TN + FP)

→ F1-Score: The F1-score is the harmonic mean of precision and recall. It provides a balanced measure between precision and recall. It is calculated as follows:

F1-Score = 2 * (Precision * Recall) / (Precision + Recall)

These evaluation metrics can be used to assess the performance of a classification model and make informed decisions based on the model’s performance.

Example of Confusion Metrics

here’s an example of a confusion matrix for a binary classification problem:

In this example, we have a binary classification problem where the positive class is labeled “Yes” and the negative class is labeled “No”. The confusion matrix shows the counts of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) for the model’s predictions.

Using these values, we can calculate the following performance metrics:

  • Accuracy = (TP + TN) / (TP + FP + TN + FN) = (45 + 15) / (45 + 10 + 5 + 15) = 0.75
  • Precision = TP / (TP + FP) = 45 / (45 + 10) = 0.82
  • Recall (Sensitivity) = TP / (TP + FN) = 45 / (45 + 5) = 0.9
  • Specificity = TN / (TN + FP) = 15 / (15 + 5) = 0.75
  • F1-score = 2 * (precision * recall) / (precision + recall) = 2 * (0.82 * 0.9) / (0.82 + 0.9) = 0.86

Accuracy measures the proportion of correct predictions overall. In this case, the model has an accuracy of 0.75, meaning it correctly predicted 75% of the instances.

Precision measures the proportion of true positive predictions among all positive predictions made by the model. In this case, the model has a precision of 0.82, meaning 82% of the instances it predicted as positive were actually positive.

Recall measures the proportion of true positive predictions among all actual positive instances in the data. In this case, the model has a recall of 0.9, meaning it correctly identified 90% of the actual positive instances.

Specificity measures the proportion of true negative predictions among all actual negative instances in the data. In this case, the model has a specificity of 0.75, meaning it correctly identified 75% of the actual negative instances.

F1-score is the harmonic mean of precision and recall, providing a balanced measure between the two. In this case, the model has an F1-score of 0.86, indicating a relatively good balance between precision and recall.

--

--

Utsav Desai
Utsav Desai

Written by Utsav Desai

Utsav Desai is a technology enthusiast with an interest in DevOps, App Development, and Web Development.

No responses yet