Get startedGet started for free

Logistic regression models

1. Logistic Regression Models

Welcome to the final chapter of machine learning in the tidyverse. Throughout this course you have learned a variety of tidyverse tools aimed at building regression models. In this chapter you will shift gears to work with another group of models called binary classification models.

2. Binary Classification

Binary classification models are among the most common class of models used by data scientists. These models are trained to assign an observation to one of two possible classes using the available set of features.

3. The attrition Dataset

To learn the tools and skills associated with these models you will explore the attrition dataset. This dataset contains over 1400 observations of employees at a fictional company. Each observation provides a variety of features about the employee such as education, income, work-life balance, and job satisfaction. The outcome variable that you are interested in this data is called Attrition, this indicates whether the employee has left the company or not. Throughout this chapter you will work on building a model that will use the available features to predict if an employee has quit. In a real world scenario a model like this can be used by a company to identify employees that are at risk and potentially intervene.

4. Logistic Regression

The first model that you will work with is the logistic regression model. This is very similar to a linear model except that for a given observation it returns the probability of that observation belonging to the positive class. Here, this would be the probability of attrition. In order to build a logistic regression model in R you will use the generalized linear model function, glm(). Similar to the lm() function you need to provide the formula and the data but now you have a new parameter called family which must be set to binomial for a logistic regression model.

5. glm()

Working with the cross-validated data frame, cv_data, you will build a logistic regression model for each fold. As before, you can leverage the mutate() and map() combination to map the glm() function for each train data frame.

6. Time to Practice

In the next set of exercises you will apply what you've learned in order to prepare the attrition dataset for the train-test-validate splits and build a logistic regression model for each fold.