Using ML classification to catch fraud
In this exercise you'll see what happens when you use a simple machine learning model on our credit card data instead.
Do you think you can beat those results? Remember, you've predicted 22 out of 50 fraud cases, and had 16 false positives.
So with that in mind, let's implement a Logistic Regression model. If you have taken the class on supervised learning in Python, you should be familiar with this model. If not, you might want to refresh that at this point. But don't worry, you'll be guided through the structure of the machine learning model.
The X
and y
variables are available in your workspace.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Split
X
andy
into training and test data, keeping 30% of the data for testing. - Fit your model to your training data.
- Obtain the model predicted labels by running
model.predict
onX_test
. - Obtain a classification comparing
y_test
withpredicted
, and use the given confusion matrix to check your results.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create the training and testing sets
X_train, X_test, y_train, y_test = train_test_split(____, ____, test_size=____, random_state=0)
# Fit a logistic regression model to our data
model = LogisticRegression()
model.fit(____, ____)
# Obtain model predictions
predicted = model.predict(____)
# Print the classifcation report and confusion matrix
print('Classification report:\n', classification_report(____, ____))
conf_mat = confusion_matrix(y_true=y_test, y_pred=predicted)
print('Confusion matrix:\n', conf_mat)