Logistic regression baseline classifier

In the last 2 lessons, you learned how valuable feature selection is in the context of machine learning interviews. Another set of common questions you should expect in a machine learning interview pertain to feature engineering, and how they help improve model performance.

In this exercise, you'll engineer a new feature on the loan_data dataset from Chapter 1, compare the accuracy score of Logistic Regression models on the dataset before and after feature engineering by comparing test labels with the predicted values of the target variable Loan Status.

All relevant packages have been imported for you: matplotlib.pyplot as plt, seaborn as sns, LogisticRegression from sklearn.linear_model, train_test_split from sklearn.model_selection, and accuracy_score from sklearn.metrics.

Feature engineering is considered a pre-processing step before modeling: Machine learning pipeline

This exercise is part of the course

Practicing Machine Learning Interview Questions in Python

View Course

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create X matrix and y array
X = loan_data.____("____", axis=1)
y = loan_data["____"]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)

# Instantiate
logistic = ____()

# Fit
logistic.____(____, ____)

# Predict and print accuracy
print(____(y_true=____, y_pred=logistic.____(____)))

Edit and Run Code