The ML Maven: Introducing the Confusion Matrix

Random Friend: OMG! you won’t believe this - I got a high accuracy value of 88%!!!

Me: oh really? Sounds interesting!!! From the said metric, it seems to be a classification problem. So, what specific problem is your classifier trying to solve?

Random Friend: so, my model helps predict whether we are going to have an earthquake or not in Wakanda using data sourced from the Government of Wakanda’s website on earthquake happenings over a period of time. Here are some visual EDA on the training dataset.

Me: Wawuu! Can I take a look at your confusion matrix?

Random Friend: Does that really matter? Besides I think I got a high accuracy.

Me: Smiles! It definitely matters especially with the fact that it looks like you are dealing with imbalanced classes as shown by the visualization.

Random Friend: oh I see! Then, let me quickly generate that for you… Here it is below -

Me: Alrightee! Let’s see what we have here and explain certain useful metrics below -

So, having an accuracy of 88% means that your model is correct 88% of the time but incorrect 12% of the time. Well, since this sounds like a life-death situation, it does not seem good enough. Just imagine the number of lives that can be lost 12% of the time the model incorrectly makes a prediction 😨!

It is a very common scenario where one class is more than the other class, like in your case, the class of NO earthquakes occurrences are more frequent or have more instances than the other class of YES earthquake occurrences.

So, Accuracy is not enough to evaluate the performance of your model, Hence the need for a Confusion Matrix. It summarizes a model’s predictive performance and we can use it to describe the performance of your model -

Note that the Positive class is usually 1 or a YES case or it is what we are trying to detect. The negative class is usually 0 or a NO case.

As you can see from 1.md,

True Negatives (TN): 60 gives the number of NO earthquake occurrences correctly predicted
True Positives (TP): 150 gives the number of YES earthquake occurrences correctly predicted
False Negatives (FN): 10 gives the number of YES earthquake occurrences incorrectly predicted as NO earthquake occurrences.
False Positives (FP): 20 gives the number of NO earthquake occurrences incorrectly predicted as YES earthquake occurrences

A False Positive is also known as a Type 1 error. This is when the model predicts that there will be an earthquake but actually there is not. This is a False alarm! So, this will make the people of Wakanda panic which will cause the government of Wakanda to do all that it can to save lives from the possible earthquake. This can lead to a waste of resources - Perhaps the government had to move people to a different geographic location where they will be catered for by the government.

A False Negative is referred to as a Type 2 error. This is when the model predicts that there will be no earthquake but actually there is. This is catastrophic! The people of Wakanda will be chilling and suddenly an earthquake will greet them 😭!

We can also calculate some useful metrics like -

Recall: This is also called True Positive Rate, or Sensitivity or Hit Rate. It is the probability that an actual positive would be predicted positive i.e It tells us how often an actual YES earthquake occurrence will be predicted a YES earthquake occurrence - what proportion of YES earthquake occurrences are correctly classified or predicted. From 1.md, it is calculated as below -

High recall means that this model has a low false-negative rate i.e not many actual YES earthquake occurrences were classified or predicted as NO earthquake occurrences - the classifier predicted most YES earthquake occurrences correctly.

Specificity: This is also called the True Negative Rate. It is the probability that an actual negative would be predicted negative i.e It tells us how often an actual NO earthquake occurrence will be predicted a NO earthquake occurrence - what proportion of NO earthquake occurrences are correctly classified or predicted. From 1.md, it is calculated as below -
Positive Predictive Value: This is also called the Precision. It is the probability that a predicted YES is correct or true i.e It tells us how often the prediction of a YES earthquake occurrence is correct. From 1.md, it is calculated as below -

High precision means that this model has a low false-positive rate i.e not many actual NO earthquake occurrences were classified or predicted as YES earthquake occurrences.

Negative Predictive value: It is the probability that a predicted NO is correct or true i.e It tells us how often the prediction of a NO earthquake occurrence is correct. From 1.md, it is calculated as below -
F1 Score: this is also known as the Harmonic Mean of precision and recall. From 1.md, it is calculated as below -

With the problem we are trying to solve, perhaps we should be much more concerned with reducing the False Negatives or Type 2 errors i.e when the model predicts there will be no earthquakes but there are actually. This is much more dangerous than the type 1 error in this case. The model incorrectly classifies 10 cases in which earthquakes occurred by saying that it did not occur. Just imagine chilling with some fresh orange juice and watching Black Panther on Netflix and suddenly the ground starts shaking suddenly 😱!!!

The metric you choose to optimize depends on the problem being solved. Let us take a look at some scenarios below -

If the occurrence of false negatives is unaccepted, then choose to optimize Recall - like we would want to do for the earthquake occurrence problem. Here, we won’t mind getting extra False positives just to reduce the number of false negatives i.e we would rather say that an earthquake will occur when it will not RATHER than say an earthquake will not occur and it does occur.
If the occurrence of false positives is unaccepted, then choose to optimize Specificity. Let me give an example where False Positives should not be overlooked - If I am trying to predict if a patient has coronavirus: Since I am trying to detect coronavirus, so having coronavirus (a yes or 1) would represent the positive class while being healthy would represent the negative class. So, if I carry out a test where a patient that is detected or predicted as positive (i.e have coronavirus) would be quarantined, I would want to make sure a healthy person is not detected as having coronavirus. In this case, we would not accept any false positives.
If you want to be extra sure about the true positives, choose to optimize Precision. For example, if we are detecting coronavirus, testing centres would want to be very confident that a patient classified or predicted as having the virus truly has it.
Choose to optimize F1 score if you need a balance between Precision and Recall.

Random Friend: Amazinggg! With these explanations, I will definitely work on improving my model’s performance.

Me: You are always welcome! Excited you are becoming a Machine Learning Maven! Stay tuned on this series on Introducing the ROC Curve! Have an amazing and fulfilled week ahead!

Blog

The ML Maven: Introducing the Confusion Matrix

Joy Ada Uche

Join Our Newsletter. No Spam, Only the good stuff.

Related