Model prioritization

1. Model prioritization

Welcome back to the course on Demystifying Decision Science.

2. Starting simple with linear regression

Model construction in Decision Science is often an iterative process starting with simple yet effective models and then enhancing those models with more features, feature engineering, or more advanced model structures. For continuous outcomes, least squares linear regression is a solid starting point. With over 200 years of history, this method remains highly effective when its assumptions are met. Never underestimate the value of simplicity. Linear regression works by identifying the best-fit line through a multi-dimensional scatterplot of your data. By applying transformations to your features, you can easily extend this straightforward approach to capture complex, non-linear relationships, making it a versatile tool for modeling. Linear regression is fast and straightforward. By examining its coefficients, you can quickly assess whether its direction aligns with expectations, making it an excellent tool for initial insights. Using this model as a baseline, you can develop more complex models, though this often comes at the expense of interpretability, time, and potentially increased costs.

3. Logistic regression

Linear regression is a great starting point for continuous variables, but many real-world problems require models for binary or categorical outcomes. Think about it — banks use them to predict risk and fraud, healthcare relies on them to forecast mortality, and in retail, they help identify opportunities for cross-selling or pinpoint customers at risk of churn. Logistic regression is a great baseline tool for both binary outcomes and when there are more than categories of outcomes or even when the outcomes are ordered, like predicting if someone will finish high school, college, graduate school, or higher. Logistic regression is an excellent baseline model because it provides probabilities that are highly useful for decision-making and is easy to interpret. Its coefficients reveal how much each feature influences the probability and in which direction. By setting a probability threshold, you can turn it into a fast and straightforward classification algorithm.

4. Feature transformations

It's always a good idea to start with a functional baseline model and refine it from there. Advanced models can be explored later to assess how much improvement they bring relative to the additional effort. Regardless of your target variable or model structure, feature transformation often enhances the model's power and interpretability. For instance, applying a log transformation or squaring a feature can increase the variance explained by the model. Additionally, some algorithms perform best when their assumptions align with the data. Take linear regression - it can deliver good results in many cases but performs optimally when the residuals, or errors, are normally distributed.

5. Common transformations

Whether using linear regression, logistic regression, or another model, transformations are powerful tools that can be readily explored. A few common ones include: Log and square root transformations which compress values with a wide range and help tame outliers. Scaling such as Standardization and Normalization rescales features to have similar ranges. This helps when features are measured on vastly different units. And adding in Polynomial Terms such as squared and cubic terms is often a great way to represent complex, non-linear relationships with minimal effort. Remember, it is useful to start simple and leverage feature transformations in model building. This will help you quickly obtain fairly powerful results before moving on to more advanced, often less interpretable models.

6. Let's practice!

Let's get some practice with these ideas

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.