Joining Tracts and Metropolitan Areas
In order to focus on how the merge
method works, a function that calculates the Index of Dissimilarity has been provided for you. (You will create this function yourself in the next exercise!)
To apply this function, you need to add the MSA identifiers to the tracts
DataFrame. You will use state
and county
, present in both DataFrames, as the join keys. At the end, you will use seaborn
's stripplot
method to show the ten most segregated metros.
The tracts
DataFrame that you have used previously is loaded. Population data by MSA is loaded as msa
, and the first few rows are displayed in the console. Finally, msa_def
is loaded with the counties that make up each MSA.
pandas
and seaborn
have been loaded with the usual aliases.
This exercise is part of the course
Analyzing US Census Data in Python
Exercise instructions
- Use the
nlargest
method on themsa
DataFrame to return the 50 largest metros by"population"
. - Both
tracts
andmsa_def
have columns"state"
and"county"
. Use themerge
method with theon
parameter to join on these columns. - Use the
merge
method to joinmsa
andmsa_D
on the MSA identifier.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find identifiers for 50 largest metros by population
msa50 = list(msa.____["msa"])
# Join MSA identifiers to tracts, restrict to largest 50 metros
msa_tracts = pd.merge(____, ____, on = ____)
msa_tracts = msa_tracts[msa_tracts["msa"].isin(msa50)]
# Calculate D using custom function, merge back into MSA
msa_D = dissimilarity(msa_tracts, "white", "black", "msa")
msa = pd.merge(msa, msa_D, ____, ____)
# Plot ten most segregated metros
sns.stripplot(x = "D", y = "name", data = msa.nlargest(10, "D"))
plt.show()