Session Ready
Exercise

Exploring more broadly

Sometimes it makes sense just to roll the dice and see what comes up. In the context of modeling, this means throwing a big set of potential explanatory variables into a model and seeing if the process of model training finds something of interest. (Only the rpart() architecture provides an opportunity to automatically choose a subset of the explanatory variables. lm() will put every variable you give it into the model.)

Let's return to Birth_weight and train a recursive partitioning model with the formula baby_wt ~ . The single period to the right of the tilde is shorthand for "use all the other variables in the data." In training the model, rpart() will partition the cases using the single most effective explanatory variable, and use the same logic to subdivide groups. (That's what the "recursive" means in recursive partitioning: go through the process of building a model for each subgroup.)

In the console, train the model baby_wt ~ . on the Birth_weight data and plot the model tree using prp(your_model, type = 3).

You'll see that gestation is identified as an important variable. That's not surprising, since that's the natural pattern: babies get bigger the longer they are in the womb.

Is smoking related to gestation period? Explore using models like gestation ~ . - baby_wt. (This means "explain gestation by all the other variables except baby weight.")

Choose the statement below that's supported by your explorations:

Instructions
50 XP
Possible Answers