Get startedGet started for free

Scatter matrix of numeric columns

You've investigated the new farmer's market data, and it's rather wide – with lots of columns of information for each market's row. Rather than painstakingly going through every combination of numeric columns and making a scatter plot to look at correlations, you decide to make a scatter matrix using the pandas built-in function.

Increasing the figure size with the figsize argument will help give the dense visualization some breathing room. Since there will be a lot of overlap for the points, decreasing the point opacity will help show the density of these overlaps.

This exercise is part of the course

Improving Your Data Visualizations in Python

View Course

Exercise instructions

  • Subset the columns of the markets DataFrame to numeric_columns so the scatter matrix only shows numeric non-binary columns.
  • Increase figure size to 15 by 10 to avoid crowding.
  • Reduce point opacity to 50% to show regions of overlap.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Select just the numeric columns (exluding individual goods)
numeric_columns = ['lat', 'lon', 'months_open', 'num_items_sold', 'state_pop']

# Make a scatter matrix of numeric columns
pd.plotting.scatter_matrix(markets[____], 
                             # Make figure large to show details
                             figsize = ____, 
                           # Lower point opacity to show overlap
                           alpha = ____)

plt.show()
Edit and Run Code