Exercise

Examining the structure of categorical inputs

For this exercise, you will call model.matrix() to examine how R represents data with both categorical and numerical inputs for modeling. The dataset flowers (derived from the Sleuth3 package) has been loaded for you. It has the following columns:

  • Flowers: the average number of flowers on a meadowfoam plant
  • Intensity: the intensity of a light treatment applied to the plant
  • Time: A categorical variable - when (Late or Early) in the lifecycle the light treatment occurred

The ultimate goal is to predict Flowers as a function of Time and Intensity.

Instructions

100 XP
  • Call the str() function on flowers to see the types of each column.
  • Use the unique() function on the column flowers$Time to see the possible values that Time takes. How many unique values are there?
  • Create a formula to express Flowers as a function of Intensity and Time. Assign it to the variable fmla and print it.
  • Use fmla and model.matrix() to create the model matrix for the data frame flowers. Assign it to the variable mmat.
  • Use head() to examine the first 20 lines of flowers.
  • Now examine the first 20 lines of mmat.
    • Is the numeric column Intensity different?
    • What happened to the categorical column Time from flowers?
    • How is Time == 'Early' represented? And Time == 'Late'?