Separating house prices with UMAP
You have reduced the dimensionality of the California house sales data (house_sales_df
) using PCA and t-SNE. You will now use UMAP. The end result of UMAP is very similar to that of t-SNE, however, UMAP tends to be more computationally efficient. It also strives to retain more of the global structure. In practical terms, this means you can interpret the distance between clusters as a measure of similarity — something that you couldn't do with t-SNE.
Remember, the target variable of house_sales_df
is price
. Set num_comp = 2
. The tidyverse
and embed
packages have been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Fit UMAP to all the predictors in
house_sales_df
usingstep_umap()
in a recipe and store the transformed data inumap_df
. - Plot the UMAP dimensions using
ggplot()
, encoding the target variableprice
in color.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit UMAP
set.seed(1234)
umap_df <- ___(___ ~ ., data = ___) %>%
___(___()) %>%
___(___(), num_comp = 2) %>%
prep() %>%
___()
# Plot UMAP
___ %>%
___(aes(x = ___, y = ___, color = ___)) +
___(alpha = 0.7) +
scale_color_gradient(low="gray", high="blue")