1. Variable Importance
Complex models can be accurate, but how do we interpret them?
2. Adding more predictors
Real-life models include many variables. Sometimes thousands or even millions. Some models can be very precise but relatively obscure. Often referred to as black boxes.
However, it is possible to lend interpretability by estimating variable importance by quantifying the relationship between the predictor and the outcome by changing each predictor individually and assessing the loss of performance when the effect of the predictor is absent.
3. Which variables matter most?
We can plot features by importance with the aid of the vip() package. In our case, contract_value_band seems to be the most influent feature in our model, followed by sponsor_code and category_code.
Understanding variable importance can help us select model features and eliciting potential theoretical features.
4. Variable importance and feature engineering
Variable importance is closely related to domain knowledge when building a high-performing machine learning model. Domain knowledge can help identify which variables are likely to be important for making accurate predictions, and this understanding can guide the feature selection and engineering process.
Once a model is built, variable importance measures can be used to assess the relative importance of each variable in making predictions. This information can be used to further refine the model and focus on the most informative variables. For example, variables with low importance scores can potentially be removed to simplify the model and improve its interpretability, or new variables can be added to better capture important information that was not included in the original model.
In short, domain knowledge can guide the feature selection and engineering process, which in turn can impact the variable importance scores in the final model. Once the model is built, variable importance measures can provide further insight into which variables are most relevant for accurate predictions and can guide further refinement of the model.
5. Let's practice!
Time to give it a try.