Identify optimal L1 penalty coefficient

You will now tune the C parameter for the L1 regularization to discover the one which reduces model complexity while still maintaining good model performance metrics. You will run a for loop through possible C values and build logistic regression instances on each, as well as calculate performance metrics.

A list C has been created with the possible values. The l1_metrics array has been built with 3 columns, with the first being the C values, and the next two being placeholders for non-zero coefficient counts and the recall score of the model. The scaled features and target variables have been loaded as train_X, train_Y for training, and test_X, test_Y for testing.

Both numpy and pandas are loaded as np and pd as well as the recall_score function from sklearn.

Run a for loop over the range from 0 to the length of the list C.
For each C candidate, initialize and fit a Logistic Regression and predict churn on test data.
For each C candidate, store the non-zero coefficients and the recall score in the second and third columns of l1_metrics.
Create a pandas DataFrame out of l1_metrics with the appropriate column names.

Machine learning for marketing basics

Churn prediction and drivers

Customer Lifetime Value (CLV) prediction

Customer segmentation

Exercise

Identify optimal L1 penalty coefficient

Instructions