34

# Confusion Matrix use cases in Cyber Crime

What is the Confusion Matrix?

The confusion matrix is a useful tool used for classification tasks in machine learning with the primary objective of visualizing the performance of a machine learning model.

In a binary classification setting where the negative class is 0 and the positive class is 1, the confusion matrix is constructed with a 2x2 grid table where the rows are the actual outputs of the data, and the columns are the predicted values from the model.

How to interpret a Confusion Matrix?

I will start by first describing some terminology, if you are able to work out where they would go then you are halfway there:

True Positive: The model predicted positive and the label was actually positive.

True Negative: The model predicted negative and the label was actually negative.

False Positive: The model predicted positive and the label was actually negative — I like to think of this as falsely classified as positive.

False Negative: The model predicted negative and the label was actually positive— I like to think of this as falsely classified as negative.

Our confusion matrix has two red patches on our grid, which infer to us the type of errors that our model is making.

Type-1 => False positive

Type-2 => False negative

How we use the confusion matrix in cyber crime?

Let's take a case study of this:

We are tasked with building a classifier that predicts fraudulent or non fraudulent for different transactions. The data was handed to us by a major bank in the UK with very tight security – the personal details of all customers were encrypted for privacy concerns – so fraudulent transactions do not happen very often. In fact, the data that they had given us had 10,000 Negative cases (non-fraudulent) and 1000 positive cases(fraud). The below Figure shows the results of our classifier.

In the above image there are 11000 cases of fraud and non-fraudulent among them our ML model find the 10000 cases are not-Fraudulent i.e negative .But actually 9900 cases among them are True Negative(TN) and the rest 100 cases are False Negative(FN).here our model will predict the wrong output.so our model will not alert the Bank and The bank also don't the fraud cases. Because the model predict 10000 cases are not-fraudulent.so this type of error FN(Type-2) is very dangerous in this case. The rest 1000 cases are fraud cases but among them 800 cases are not fraud but the model gives us this is as a fraud cases i.e

False Negative(Type-1).in this case the Type-1 error are not so dangerous. but the Type-2 error are so dangerous.