CommencerCommencer gratuitement

Visualizing multiple explanatory variables

Logistic regression also supports multiple explanatory variables. Plotting has similar issues as the linear regression case: it quickly becomes difficult to include more numeric variables in the plot. Here we'll look at the case of two numeric explanatory variables, and the solution is basically the same as before: use color to denote the response.

Here there are only two possible values of response (zero and one), and later when we add predicted responses, the values all lie between zero and one. Once you include predicted responses, the most important thing to determine from the plot is whether the predictions are close to zero, or close to one. That means that a 2-color gradient split at 0.5 is really useful: responses above 0.5 are one color, and responses below 0.5 are another color.

The bank churn dataset is available as churn; ggplot2 is loaded.

Cet exercice fait partie du cours

Intermediate Regression in R

Afficher le cours

Instructions

  • Using the churn dataset, plot the recency of purchase, time_since_last_purchase, versus the length of customer relationship, time_since_first_purchase, colored by whether or not the customer churned, has_churned.
  • Add a point layer, with transparency set to 0.5.
  • Use a 2-color gradient, with midpoint 0.5.
  • Use the black and white theme.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Using churn, plot recency vs. length of relationship colored by churn status
___ +
  # Make it a scatter plot, with transparency 0.5
  ___ +
  # Use a 2-color gradient split at 0.5
  ___ +
  # Use the black and white theme
  ___
Modifier et exécuter le code