Exercise

Separating house prices with UMAP

You have reduced the dimensionality of the California house sales data (house_sales_df) using PCA and t-SNE. You will now use UMAP. The end result of UMAP is very similar to that of t-SNE, however, UMAP tends to be more computationally efficient. It also strives to retain more of the global structure. In practical terms, this means you can interpret the distance between clusters as a measure of similarity — something that you couldn't do with t-SNE.

Remember, the target variable of house_sales_df is price. Set num_comp = 2. The tidyverse and embed packages have been loaded for you.

Instructions

100 XP
  • Fit UMAP to all the predictors in house_sales_df using step_umap() in a recipe and store the transformed data in umap_df.
  • Plot the UMAP dimensions using ggplot(), encoding the target variable price in color.