Confusion Matrix & Cyber World

Aditi Awasthi
4 min readJun 3, 2021

--

As we know, the Supervised Machine Learning algorithm can be broadly classified into Regression and Classification Algorithms. In Regression algorithms, we have predicted the output for continuous values, but to predict the categorical values, we need Classification algorithms.

What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. In Classification, a program learns from the given dataset or observations and then classifies new observation into a number of classes or groups.

Confusion Matrix

The confusion matrix is a matrix used to determine the performance of the classification models for a given set of test data. It can only be determined if the true values for test data are known. A confusion matrix is not a metric to evaluate a model, but it provides insight into the predictions. It is important to learn confusion matrix in order to comprehend other classification metrics such as precision and recall.

Confusion matrix goes deeper than classification accuracy by showing the correct and incorrect (i.e. true or false) predictions on each class. In case of a binary classification task, a confusion matrix is a 2x2 matrix. If there are three different classes, it is a 3x3 matrix and so on.

Let’s assume class A is positive class and class B is negative class. The key terms of confusion matrix are as follows:

  • True positive (TP): Predicting positive class as positive (ok)
  • False positive (FP): Predicting negative class as positive (not ok)
  • False negative (FN): Predicting positive class as negative (not ok)
  • True negative (TN): Predicting negative class as negative (ok)

Now let’s understand TP, TN, FP, and FN in a Confusion Matrix

True Positive (TP)

The predicted value matches the actual value

The actual value was positive and the model predicted a positive value

True Negative (TN)

The predicted value matches the actual value

The actual value was negative and the model predicted a negative value

False Positive (FP) — Type 1 error

The predicted value was falsely predicted

The actual value was negative but the model predicted a positive value

False Negative (FN) — Type 2 error

The predicted value was falsely predicted

The actual value was positive but the model predicted a negative value

Confusion matrix is used to calculate precision and recall.

Precision and recall metrics take the classification accuracy one step further and allow us to get a more specific understanding of model evaluation. Which one to prefer depends on the task and what we aim to achieve.

Precision measures how good our model is when the prediction is positive. It is the ratio of correct positive predictions to all positive predictions:

Recall measures how good our model is at correctly predicting positive classes. It is the ratio of correct positive predictions to all positive classes.

The focus of precision is positive predictions so it indicates how many positive predictions are true. The focus of recall is actual positive classes so it indicates how many of the positive classes the model is able to predict correctly.

Confusion matrix in cyber world

Positive : System is secure

Negative : System is insecure

TP: In this case, our system is secure, which means no action must be taken by the cybersecurity officers.

TN: In this case, our model detects the moment when our system is insecure and now the cybersecurity officers are sure about taking some action.

FP(Type-1 error): In this case, our model predicts the system is secure but our system is actually under attack. Therefore, the cybersecurity officers have no clue about the attack which may cause a major Cyberattack.

FN(Type-2 error): In this case, our model predicts that the system is insecure which makes the cybersecurity officers more active in detecting the issue. However, this does not lead to any actual harm to the system.

From the above discussion, we could conclude that the Type-1 error is more dangerous than the Type-2 error.

Sensitivity and Specificity

Sensitivity, also known as the true positive rate (TPR), is the same as recall. Hence, it measures the proportion of positive class that is correctly predicted as positive.

Specificity is similar to sensitivity but focused on negative class. It measures the proportion of negative class that is correctly predicted as negative.

Hope linking confusion matrix with cyber world through my blog made it little less confusing.

THANK YOU!

--

--