Separating house prices with t-SNE
t-SNE is a non-linear dimensionality reduction technique. It embeds high-dimensional data into a lower-dimensional space. As it does so, it strives to keep points next to their original neighbors. You will create a t-SNE plot that you can compare with the PCA plot in the last exercise. PCA preserves the global structure of the data, but not the local structure. t-SNE preserves the local structure by keeping neighbors in the higher-dimensional space close to each other in the lower-dimensional space. You will see this in the plots.
You will apply t-SNE to reduce the house_sales_df
. The target variable of house_sales_df
is price
. The tidyverse
and Rtsne
packages have been loaded for you.
This exercise is part of the course
Dimensionality Reduction in R
Exercise instructions
- Fit t-SNE to
house_sales_df
usingRtsne()
. - Bind the t-SNE X and Y coordinates to
house_sales_df
. - Plot the t-SNE results using
ggplot()
, encoding the target variable in color.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Fit t-SNE
set.seed(1234)
tsne <- ___(___ %>% select(-___), check_duplicates = FALSE)
# Bind t-SNE coordinates to the data frame
tsne_df <- ___ %>%
___(tsne_x = ___$___[,___], tsne_y = ___$___[,___])
# Plot t-SNE
___ %>%
___(aes(x = ___, y = ___, color = ___)) +
geom_point() +
scale_color_gradient(low="gray", high="blue")