Separating house prices with UMAP

You have reduced the dimensionality of the California house sales data (house_sales_df) using PCA and t-SNE. You will now use UMAP. The end result of UMAP is very similar to that of t-SNE, however, UMAP tends to be more computationally efficient. It also strives to retain more of the global structure. In practical terms, this means you can interpret the distance between clusters as a measure of similarity — something that you couldn't do with t-SNE.

Remember, the target variable of house_sales_df is price. Set num_comp = 2. The tidyverse and embed packages have been loaded for you.

Fit UMAP to all the predictors in house_sales_df using step_umap() in a recipe and store the transformed data in umap_df.
Plot the UMAP dimensions using ggplot(), encoding the target variable price in color.

Foundations of Dimensionality Reduction

Feature Selection for Feature Importance

Feature Selection for Model Performance

Feature Extraction and Model Performance

Exercise

Separating house prices with UMAP

Instructions