Calculating distance between categorical variables
In this exercise you will explore how to calculate binary (Jaccard) distances.
In order to calculate distances we will first have to dummify our categories using the dummy.data.frame()
from the library dummies
You will use a small collection of survey observations stored in the data frame job_survey
with the following columns:
- job_satisfaction Possible options: "Hi", "Mid", "Low"
- is_happy Possible options: "Yes", "No"
Este ejercicio forma parte del curso
Cluster Analysis in R
Instrucciones del ejercicio
- Create a dummified data frame
dummy_survey
. - Generate a Jaccard distance matrix for the dummified survey data
dist_survey
using thedist()
function using the parametermethod = 'binary'
. - Print the original data and the distance matrix.
- Note the observations with a distance of 0 in the original data (1, 2, and 3).
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Dummify the Survey Data
dummy_survey <- ___
# Calculate the Distance
dist_survey <- ___
# Print the Original Data
___
# Print the Distance Matrix
___