Exercise

# Exploring more broadly

Sometimes it makes sense just to roll the dice and see what comes up. In the context of modeling, this means throwing a big set of potential explanatory variables into a model and seeing if the process of model training finds something of interest. (Only the `rpart()`

architecture provides an opportunity to automatically choose a subset of the explanatory variables. `lm()`

will put every variable you give it into the model.)

Let's return to `Birth_weight`

and train a recursive partitioning model with the formula `baby_wt ~ .`

The single period to the right of the tilde is shorthand for "use all the other variables in the data." In training the model, `rpart()`

will partition the cases using the single most effective explanatory variable, and use the same logic to subdivide groups. (That's what the "recursive" means in recursive partitioning: go through the process of building a model for each subgroup.)

In the console, train the model `baby_wt ~ .`

on the `Birth_weight`

data and plot the model tree using `prp(your_model, type = 3)`

.

You'll see that `gestation`

is identified as an important variable. That's not surprising, since that's the natural pattern: babies get bigger the longer they are in the womb.

Is smoking related to gestation period? Explore using models like `gestation ~ . - baby_wt`

. (This means "explain gestation by all the other variables *except* baby weight.")

Choose the statement below that's supported by your explorations:

Instructions

**50 XP**

##### Possible Answers

- Smoking is a major explanatory factor for gestation and seems to be connected to birth weight only through gestation.
- Smoking doesn't explain gestation, but it is related to birth weight.
- Lower-weight mothers tend to have smaller babies
*and*smaller gestation periods. So, perhaps mother's weight plays out through gestation.