Exercise

# Smoking and survival

There's a very important special case for classification: when the response variable has only two levels. Of course, you can use the recursive partitioning architecture, but it's much more common in the two-level situation to use a technique known as *logistic regression*. This course features `lm()`

and `rpart()`

, but it would be remiss not to mention logistic regression.

In this exercise, you'll look at the effect size of smoking on survival. The data used for modeling are in the `Whickham`

dataset, which gives a small part of the data collected in the early 1970's. Participants were asked their age and whether they smoke. A follow-up twenty years later found whether the participant was still alive.

We're interested to find the effect size of smoking on survival.

Instructions

**100 XP**

Two models are given in the editor: a recursive partitioning architecture and a logistic regression architecture.

- Use
`fmodel()`

to plot the two models to get a sense for how recursive partitioning differs from logistic regression. - Find the effect size of smoking on the probability of survival in the recursive partitioning model. (Recall that for classifiers, the effect size is given as the change in probability for each of the classes.)
- Similarly, find the effect size of smoking in the logistic regression model. Since there are only two classes, only one probability is needed.