Fit a model of sparrow survival probability
In this exercise, you will estimate the probability that a sparrow survives a severe winter storm, based on physical characteristics of the sparrow. The dataset sparrow
has been pre-loaded. The outcome to be predicted is status
("Survived", "Perished"). The variables we will consider are:
total_length
: length of the bird from tip of beak to tip of tail (mm)weight
: in gramshumerus
: length of humerus ("upper arm bone" that connects the wing to the body) (inches)
Remember that when using glm()
(docs) to create a logistic regression model, you must explicitly specify that family = binomial
:
glm(formula, data = data, family = binomial)
You will call summary()
and broom::glance()
to see different functions
for examining a logistic regression model. One of the diagnostics that you will look at is the analog to \(R^2\), called pseudo-\(R^2\).
$$ pseudoR^2 = 1 - \frac{deviance}{null.deviance} $$
You can think of deviance as analogous to variance: it is a measure of the variation in categorical data. The pseudo-\(R^2\) is analogous to \(R^2\) for standard regression: \(R^2\) is a measure of the "variance explained" of a regression model. The pseudo-\(R^2\) is a measure of the "deviance explained".
Este ejercicio forma parte del curso
Supervised Learning in R: Regression
Instrucciones del ejercicio
- As suggested in the video, you will predict on the outcomes
TRUE
andFALSE
. Create a new columnsurvived
in thesparrow
data frame that is TRUE whenstatus == "Survived"
. - Create the formula
fmla
that expressessurvived
as a function of the variables of interest. Print it. - Fit a logistic regression model to predict the probability of sparrow survival. Assign the model to the variable
sparrow_model
. - Call
summary()
to see the coefficients of the model, the deviance and the null deviance. - Call
glance()
on the model to see the deviances and other diagnostics in a data frame. Assign the output fromglance()
to the variableperf
. - Calculate the pseudo-\(R^2\).
Ejercicio interactivo práctico
Prueba este ejercicio completando el código de muestra.
# sparrow is available
summary(sparrow)
# Create the survived column
sparrow$survived <- ___
# Create the formula
(fmla <- _____)
# Fit the logistic regression model
sparrow_model <- ___
# Call summary
___
# Call glance
(perf <- ___)
# Calculate pseudo-R-squared
(pseudoR2 <- ___)