Session Ready
Exercise

Logistic regression baseline classifier

In the last 2 lessons, you learned how valuable feature selection is in the context of machine learning interviews. Another set of common questions you should expect in a machine learning interview pertain to feature engineering, and how they help improve model performance.

In this exercise, you'll engineer a new feature on the loan_data dataset from Chapter 1, compare the accuracy score of Logistic Regression models on the dataset before and after feature engineering by comparing test labels with the predicted values of the target variable Loan Status.

All relevant packages have been imported for you: matplotlib.pyplot as plt, seaborn as sns, LogisticRegression from sklearn.linear_model, train_test_split from sklearn.model_selection, and accuracy_score from sklearn.metrics.

Feature engineering is considered a pre-processing step before modeling: Machine learning pipeline

Instructions 1/4
undefined XP
  • 1
    • Fit and predict a Logistic Regression on loan_data with the target variable Loan Status as y and evaluate the trained model's accuracy score.
    • 2
      • Convert Annual Income to monthly, and derive the ratio of Monthly Debt to monthly_income and store it in dti_ratio.
    • 3
      • Convert the target variable to numerical values and replace categorical features with dummy values.
    • 4
      • Fit and predict a Logistic Regression on loans_dti and evaluate the trained model's accuracy score.