Get startedGet started for free

Putting it all together

1. Putting it all together

We have come a long way since chapter one. Let's build an end-to-end model using some of the new tricks in our bag to wrap things up.

2. A stylized process modeling flow

There is no simple way to describe the modeling process. In reality, things are much less linear than any chart will show. But it is helpful to have a basic structure to work from. It all starts with data. We begin with data ingestion. However, data rarely comes ready for work. So we need to prepare it and shape it into a usable form. Once the data is ready, we build and assess our model. But that is rarely the end of the story. Our assessment might be disappointing, leading us to rework preprocessing and modeling to fine-tune the whole thing. Even when observing good performance, the assessment process might shed new insights we want to incorporate. The same is true of the insights gained at the application stage.

3. A stylized process modeling flow

The prepare and preprocess stages are typically associated with feature engineering, where we spent most of our time in this course. But they only make sense in the context of the whole modeling process.

4. Prepare

While data acquisition or ingestion is the genesis of any analysis, it is a subject on its own and out of the scope of this course. We will continue to assume that data is available and we have it as a data frame. That being said, there is still some prepping to do. We start with basic housekeeping to our "loans" data by converting characters into factors, along with Credit_History. Then we split our data into test and train subsets stratifying by Loan_Status, our target variable, without forgetting to set up a seed for replicability. A glimpse at our data shows it as ready to go.

5. Preprocess

The preprocessing stage is where feature engineering shines. The amount of tweaking and transforming necessary to get the best out of your model varies on a project-by-project basis and is tempered by our specific goals. In our case, we assign Loan_ID an "ID" role so that we can keep it as a reference. Then, we normalize our numeric predictors and impute missing values. Finally, we encode all nominal predictors as dummy variables. Printing the recipe gives us a handy summary of our setup.

6. Model

We are ready to define our model. We will use logistic regression with an L1 penalty, best known as the Lasso, and will machine-tune the penalty for best performance. To that end, we set up a penalty grid with 30 levels and bundle our model and recipe into a workflow. Printing lr_workflow displays a summary of our model settings.

7. Assess

Our tunning results show that small amounts of regularization result in the best performance as measured by ROC_AUC.

8. Assess

We choose the best penalty value from our tunning exercise and fit our final model. We just completed an entire modeling workflow yielding respectable results!

9. Let's practice!

Time to put it into practice!