Scatter matrix of numeric columns
You've investigated the new farmer's market data, and it's rather wide – with lots of columns of information for each market's row. Rather than painstakingly going through every combination of numeric columns and making a scatter plot to look at correlations, you decide to make a scatter matrix using the pandas
built-in function.
Increasing the figure size with the figsize
argument will help give the dense visualization some breathing room. Since there will be a lot of overlap for the points, decreasing the point opacity will help show the density of these overlaps.
This exercise is part of the course
Improving Your Data Visualizations in Python
Exercise instructions
- Subset the columns of the
markets
DataFrame tonumeric_columns
so the scatter matrix only shows numeric non-binary columns. - Increase figure size to
15
by10
to avoid crowding. - Reduce point opacity to 50% to show regions of overlap.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Select just the numeric columns (exluding individual goods)
numeric_columns = ['lat', 'lon', 'months_open', 'num_items_sold', 'state_pop']
# Make a scatter matrix of numeric columns
pd.plotting.scatter_matrix(markets[____],
# Make figure large to show details
figsize = ____,
# Lower point opacity to show overlap
alpha = ____)
plt.show()