Examining the structure of categorical inputs
For this exercise, you will call model.matrix() (docs) to
examine how R represents data with both categorical and numerical inputs for modeling.
The dataset flowers (derived from the Sleuth3 package) has been loaded for you. It has the following columns:
Flowers: the average number of flowers on a meadowfoam plantIntensity: the intensity of a light treatment applied to the plantTime: A categorical variable - when (LateorEarly) in the lifecycle the light treatment occurred
The ultimate goal is to predict Flowers as a function of Time and Intensity.
Este ejercicio forma parte del curso
Supervised Learning in R: Regression
Instrucciones del ejercicio
- Call the
str()function onflowersto see the types of each column. - Use the
unique()function on the columnflowers$Timeto see the possible values thatTimetakes. How many unique values are there? - Create a formula to express
Flowersas a function ofIntensityandTime. Assign it to the variablefmlaand print it. - Use
fmlaandmodel.matrix()to create the model matrix for the data frameflowers. Assign it to the variablemmat. - Use
head()to examine the first 20 lines offlowers. - Now examine the first 20 lines of
mmat.- Is the numeric column
Intensitydifferent? - What happened to the categorical column
Timefromflowers? - How is
Time == 'Early'represented? AndTime == 'Late'?
- Is the numeric column
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Call str on flowers to see the types of each column
___
# Use unique() to see how many possible values Time takes
___
# Build and print a formula to express Flowers as a function of Intensity and Time: fmla
(fmla <- ___("Flowers ~ Intensity + Time"))
# Use fmla and model.matrix to see how the data is represented for modeling
mmat <- ___
# Examine the first 20 lines of flowers
___
# Examine the first 20 lines of mmat
___