Get startedGet started for free

Joining Tracts and Metropolitan Areas

In order to focus on how the merge method works, a function that calculates the Index of Dissimilarity has been provided for you. (You will create this function yourself in the next exercise!)

To apply this function, you need to add the MSA identifiers to the tracts DataFrame. You will use state and county, present in both DataFrames, as the join keys. At the end, you will use seaborn's stripplot method to show the ten most segregated metros.

The tracts DataFrame that you have used previously is loaded. Population data by MSA is loaded as msa, and the first few rows are displayed in the console. Finally, msa_def is loaded with the counties that make up each MSA.

pandas and seaborn have been loaded with the usual aliases.

This exercise is part of the course

Analyzing US Census Data in Python

View Course

Exercise instructions

  • Use the nlargest method on the msa DataFrame to return the 50 largest metros by "population".
  • Both tracts and msa_def have columns "state" and "county". Use the merge method with the on parameter to join on these columns.
  • Use the merge method to join msa and msa_D on the MSA identifier.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Find identifiers for 50 largest metros by population
msa50 = list(msa.____["msa"])

# Join MSA identifiers to tracts, restrict to largest 50 metros
msa_tracts = pd.merge(____, ____, on = ____)
msa_tracts = msa_tracts[msa_tracts["msa"].isin(msa50)]

# Calculate D using custom function, merge back into MSA
msa_D = dissimilarity(msa_tracts, "white", "black", "msa")
msa = pd.merge(msa, msa_D, ____, ____)

# Plot ten most segregated metros
sns.stripplot(x = "D", y = "name", data = msa.nlargest(10, "D"))
plt.show()
Edit and Run Code