ComeçarComece de graça

Separating house prices with UMAP

You have reduced the dimensionality of the California house sales data (house_sales_df) using PCA and t-SNE. You will now use UMAP. The end result of UMAP is very similar to that of t-SNE, however, UMAP tends to be more computationally efficient. It also strives to retain more of the global structure. In practical terms, this means you can interpret the distance between clusters as a measure of similarity — something that you couldn't do with t-SNE.

Remember, the target variable of house_sales_df is price. Set num_comp = 2. The tidyverse and embed packages have been loaded for you.

Este exercício faz parte do curso

Dimensionality Reduction in R

Ver curso

Instruções do exercício

  • Fit UMAP to all the predictors in house_sales_df using step_umap() in a recipe and store the transformed data in umap_df.
  • Plot the UMAP dimensions using ggplot(), encoding the target variable price in color.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Fit UMAP
set.seed(1234)
umap_df <- ___(___ ~ ., data = ___) %>% 
  ___(___()) %>% 
  ___(___(), num_comp = 2) %>% 
  prep() %>% 
  ___() 

# Plot UMAP
___ %>%  
  ___(aes(x = ___, y = ___, color = ___)) +
  ___(alpha = 0.7) +
  scale_color_gradient(low="gray", high="blue")
Editar e executar o código