Explore gender data
The data gender
contains the Weight
, the Height
and the BMI
indices of 10,000 people. The original data has a Gender
label for 5,000 people that identify themselves as females and the other 5,000 as males. The labels will be useful later for testing how well the clustering performs against the real labels. However, in this subset of the dataset, the labels are not provided.
The data gender_with_probs
also contains the probabilities of each data point belonging to a cluster. Since we are interested in two clusters, probabilities near 1
refer to one cluster and near 0
to the other.
The aim of this exercise is to have a glance at how a typical clustering dataset looks like before and after clustering.
Este exercício faz parte do curso
Mixture Models in R
Instruções do exercício
- Use the function
head
to look at the first 6 observations ofgender
. - Use the function
head
to look at the first 6 observations ofgender_with_probs
. - Make a scatterplot with
Weight
in the x-axis andBMI
in the y-axis. Colour the dots by their probability.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Have a look to gender (before clustering)
head(___)
# Have a look to gender_with_probs (after clustering)
head(___)
# Scatterplot with probabilities
gender_with_probs %>%
ggplot(aes(x = ___, y = ___, col = ___))+
geom_point(alpha = 0.5)