Exercise

# Visualizing clusters

After KMeans model training with an optimum K value (K = 15), in this final part of the exercise, you will visualize the clusters and their cluster centers (centroids) and see if they overlap with each other. For this, you'll first convert `rdd_split_int`

RDD into spark DataFrame and then into Pandas DataFrame for plotting. Similarly, you'll convert `cluster_centers`

into Pandas DataFrame. Once the DataFrames are created, you'll use `matplotlib`

library to create scatter plots.

Remember, you already have a SparkContext `sc`

, `rdd_split_int`

and `cluster_centers`

variables available in your workspace.

Instructions

**100 XP**

- Convert
`rdd_split_int`

RDD into a Spark DataFrame. - Convert Spark DataFrame into a Pandas DataFrame.
- Create a Pandas DataFrame from
`cluster_centers`

list. - Create a scatter plot of the raw data and an overlaid scatter plot with centroids for k = 15.