Exercise

# Explore gender data

The data `gender`

contains the `Weight`

, the `Height`

and the `BMI`

indices of 10,000 people. The original data has a `Gender`

label for 5,000 people that identify themselves as females and the other 5,000 as males. The labels will be useful later for testing how well the clustering performs against the real labels. However, in this subset of the dataset, the labels are not provided.

The data `gender_with_probs`

also contains the probabilities of each data point belonging to a cluster. Since we are interested in two clusters, probabilities near `1`

refer to one cluster and near `0`

to the other.

The aim of this exercise is to have a glance at how a typical clustering dataset looks like before and after clustering.

Instructions

- Use the function
`head`

to look at the first 6 observations of`gender`

. - Use the function
`head`

to look at the first 6 observations of`gender_with_probs`

. - Make a scatterplot with
`Weight`

in the x-axis and`BMI`

in the y-axis. Colour the dots by their probability.