ComenzarEmpieza gratis

Examining the structure of categorical inputs

For this exercise, you will call model.matrix() (docs) to examine how R represents data with both categorical and numerical inputs for modeling. The dataset flowers (derived from the Sleuth3 package) has been loaded for you. It has the following columns:

  • Flowers: the average number of flowers on a meadowfoam plant
  • Intensity: the intensity of a light treatment applied to the plant
  • Time: A categorical variable - when (Late or Early) in the lifecycle the light treatment occurred

The ultimate goal is to predict Flowers as a function of Time and Intensity.

Este ejercicio forma parte del curso

Supervised Learning in R: Regression

Ver curso

Instrucciones del ejercicio

  • Call the str() function on flowers to see the types of each column.
  • Use the unique() function on the column flowers$Time to see the possible values that Time takes. How many unique values are there?
  • Create a formula to express Flowers as a function of Intensity and Time. Assign it to the variable fmla and print it.
  • Use fmla and model.matrix() to create the model matrix for the data frame flowers. Assign it to the variable mmat.
  • Use head() to examine the first 20 lines of flowers.
  • Now examine the first 20 lines of mmat.
    • Is the numeric column Intensity different?
    • What happened to the categorical column Time from flowers?
    • How is Time == 'Early' represented? And Time == 'Late'?

Ejercicio interactivo práctico

Prueba este ejercicio completando el código de muestra.

# Call str on flowers to see the types of each column
___

# Use unique() to see how many possible values Time takes
___

# Build and print a formula to express Flowers as a function of Intensity and Time: fmla
(fmla <- ___("Flowers ~ Intensity + Time"))

# Use fmla and model.matrix to see how the data is represented for modeling
mmat <- ___

# Examine the first 20 lines of flowers
___

# Examine the first 20 lines of mmat
___
Editar y ejecutar código