Joining Tracts and Metropolitan Areas
In order to focus on how the merge method works, a function that calculates the Index of Dissimilarity has been provided for you. (You will create this function yourself in the next exercise!)
To apply this function, you need to add the MSA identifiers to the tracts DataFrame. You will use state and county, present in both DataFrames, as the join keys. At the end, you will use seaborn's stripplot method to show the ten most segregated metros.
The tracts DataFrame that you have used previously is loaded. Population data by MSA is loaded as msa, and the first few rows are displayed in the console. Finally, msa_def is loaded with the counties that make up each MSA.
pandas and seaborn have been loaded with the usual aliases.
This exercise is part of the course
Analyzing US Census Data in Python
Exercise instructions
- Use the
nlargestmethod on themsaDataFrame to return the 50 largest metros by"population". - Both
tractsandmsa_defhave columns"state"and"county". Use themergemethod with theonparameter to join on these columns. - Use the
mergemethod to joinmsaandmsa_Don the MSA identifier.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find identifiers for 50 largest metros by population
msa50 = list(msa.____["msa"])
# Join MSA identifiers to tracts, restrict to largest 50 metros
msa_tracts = pd.merge(____, ____, on = ____)
msa_tracts = msa_tracts[msa_tracts["msa"].isin(msa50)]
# Calculate D using custom function, merge back into MSA
msa_D = dissimilarity(msa_tracts, "white", "black", "msa")
msa = pd.merge(msa, msa_D, ____, ____)
# Plot ten most segregated metros
sns.stripplot(x = "D", y = "name", data = msa.nlargest(10, "D"))
plt.show()