Get startedGet started for free

Separating house prices with UMAP

You have reduced the dimensionality of the California house sales data (house_sales_df) using PCA and t-SNE. You will now use UMAP. The end result of UMAP is very similar to that of t-SNE, however, UMAP tends to be more computationally efficient. It also strives to retain more of the global structure. In practical terms, this means you can interpret the distance between clusters as a measure of similarity — something that you couldn't do with t-SNE.

Remember, the target variable of house_sales_df is price. Set num_comp = 2. The tidyverse and embed packages have been loaded for you.

This exercise is part of the course

Dimensionality Reduction in R

View Course

Exercise instructions

  • Fit UMAP to all the predictors in house_sales_df using step_umap() in a recipe and store the transformed data in umap_df.
  • Plot the UMAP dimensions using ggplot(), encoding the target variable price in color.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Fit UMAP
set.seed(1234)
umap_df <- ___(___ ~ ., data = ___) %>% 
  ___(___()) %>% 
  ___(___(), num_comp = 2) %>% 
  prep() %>% 
  ___() 

# Plot UMAP
___ %>%  
  ___(aes(x = ___, y = ___, color = ___)) +
  ___(alpha = 0.7) +
  scale_color_gradient(low="gray", high="blue")
Edit and Run Code