Identify optimal L1 penalty coefficient
You will now tune the C
parameter for the L1 regularization to discover the one which reduces model complexity while still maintaining good model performance metrics. You will run a for
loop through possible C
values and build logistic regression instances on each, as well as calculate performance metrics.
A list C
has been created with the possible values. The l1_metrics
array has been built with 3 columns, with the first being the C
values, and the next two being placeholders for non-zero coefficient counts and the recall score of the model. The scaled features and target variables have been loaded as train_X
, train_Y
for training, and test_X
, test_Y
for testing.
Both numpy
and pandas
are loaded as np
and pd
as well as the recall_score
function from sklearn
.
This exercise is part of the course
Machine Learning for Marketing in Python
Exercise instructions
- Run a
for
loop over the range from 0 to the length of the listC
. - For each
C
candidate, initialize and fit a Logistic Regression and predict churn on test data. - For each
C
candidate, store the non-zero coefficients and the recall score in the second and third columns ofl1_metrics
. - Create a
pandas
DataFrame out ofl1_metrics
with the appropriate column names.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Run a for loop over the range of C list length
for index in ___(0, len(C)):
# Initialize and fit Logistic Regression with the C candidate
logreg = ___(penalty='l1', C=C[___], solver='liblinear')
logreg.fit(___, train_Y)
# Predict churn on the testing data
pred_test_Y = logreg.___(test_X)
# Create non-zero count and recall score columns
l1_metrics[index,1] = np.___(logreg.coef_)
l1_metrics[index,2] = recall_score(___, pred_test_Y)
# Name the columns and print the array as pandas DataFrame
col_names = ['C','Non-Zero Coeffs','Recall']
print(pd.DataFrame(l1_metrics, columns=___))