Examining the structure of categorical inputs
For this exercise, you will call model.matrix() (docs) to
examine how R represents data with both categorical and numerical inputs for modeling.
The dataset flowers (derived from the Sleuth3 package) has been loaded for you. It has the following columns:
- Flowers: the average number of flowers on a meadowfoam plant
- Intensity: the intensity of a light treatment applied to the plant
- Time: A categorical variable - when (- Lateor- Early) in the lifecycle the light treatment occurred
The ultimate goal is to predict Flowers as a function of Time and Intensity.
Este exercício faz parte do curso
Supervised Learning in R: Regression
Instruções do exercício
- Call the str()function onflowersto see the types of each column.
- Use the unique()function on the columnflowers$Timeto see the possible values thatTimetakes. How many unique values are there?
- Create a formula to express Flowersas a function ofIntensityandTime. Assign it to the variablefmlaand print it.
- Use fmlaandmodel.matrix()to create the model matrix for the data frameflowers. Assign it to the variablemmat.
- Use head()to examine the first 20 lines offlowers.
- Now examine the first 20 lines of mmat.- Is the numeric column Intensitydifferent?
- What happened to the categorical column Timefromflowers?
- How is Time == 'Early'represented? AndTime == 'Late'?
 
- Is the numeric column 
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Call str on flowers to see the types of each column
___
# Use unique() to see how many possible values Time takes
___
# Build and print a formula to express Flowers as a function of Intensity and Time: fmla
(fmla <- ___("Flowers ~ Intensity + Time"))
# Use fmla and model.matrix to see how the data is represented for modeling
mmat <- ___
# Examine the first 20 lines of flowers
___
# Examine the first 20 lines of mmat
___