CommencerCommencer gratuitement

Fit a model of sparrow survival probability

In this exercise, you will estimate the probability that a sparrow survives a severe winter storm, based on physical characteristics of the sparrow. The dataset sparrow has been pre-loaded. The outcome to be predicted is status ("Survived", "Perished"). The variables we will consider are:

  • total_length: length of the bird from tip of beak to tip of tail (mm)
  • weight: in grams
  • humerus : length of humerus ("upper arm bone" that connects the wing to the body) (inches)

Remember that when using glm() (docs) to create a logistic regression model, you must explicitly specify that family = binomial:

glm(formula, data = data, family = binomial)

You will call summary() and broom::glance() to see different functions for examining a logistic regression model. One of the diagnostics that you will look at is the analog to \(R^2\), called pseudo-\(R^2\).

$$ pseudoR^2 = 1 - \frac{deviance}{null.deviance} $$

You can think of deviance as analogous to variance: it is a measure of the variation in categorical data. The pseudo-\(R^2\) is analogous to \(R^2\) for standard regression: \(R^2\) is a measure of the "variance explained" of a regression model. The pseudo-\(R^2\) is a measure of the "deviance explained".

Cet exercice fait partie du cours

Supervised Learning in R: Regression

Afficher le cours

Instructions

  • As suggested in the video, you will predict on the outcomes TRUE and FALSE. Create a new column survived in the sparrow data frame that is TRUE when status == "Survived".
  • Create the formula fmla that expresses survived as a function of the variables of interest. Print it.
  • Fit a logistic regression model to predict the probability of sparrow survival. Assign the model to the variable sparrow_model.
  • Call summary() to see the coefficients of the model, the deviance and the null deviance.
  • Call glance() on the model to see the deviances and other diagnostics in a data frame. Assign the output from glance() to the variable perf.
  • Calculate the pseudo-\(R^2\).

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# sparrow is available
summary(sparrow)

# Create the survived column
sparrow$survived <- ___

# Create the formula
(fmla <- _____)

# Fit the logistic regression model
sparrow_model <- ___

# Call summary
___

# Call glance
(perf <- ___)

# Calculate pseudo-R-squared
(pseudoR2 <- ___)
Modifier et exécuter le code