Get startedGet started for free

Calculating D for One State

In this exercise you will compute the Index of Dissimilarity for the state of Georgia. Remember that the formula for the Index of Dissimilarity is:

$$D = \frac{1}{2}\sum{\left\lvert \frac{a}{A} - \frac{b}{B} \right\rvert}$$

In this case, Group A will be Whites, Group B will be Blacks. \(a\) and \(b\) represent the White and Black population of the small geography (tracts), while \(A\) and \(B\) represent the White and Black population of the larger, containing geography (Georgia, postal code = GA, FIPS code = 13).

pandas has been imported using the usual alias, and the tracts DataFrame with population columns "white" and "black" has been loaded.

This exercise is part of the course

Analyzing US Census Data in Python

View Course

Exercise instructions

  • Create the new DataFrame ga_tracts with only the tracts in Georgia ("state" column should equal FIPS code "13")
  • Provide the column names in a list (use the variables w and b) to print the sum of Nonhispanic Whites and Blacks in Georgia
  • Take the White population of each tract divided by the sum of the White population, and subtract the Black population of each tract divided by the sum of the Black population; use the w and b variables to improve code readability

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define convenience variables to hold column names
w = "white"
b = "black"

# Extract Georgia tracts
ga_tracts = tracts[____]

# Print sums of Black and White residents of Georgia
print(ga_tracts[____].sum())

# Calculate Index of Dissimilarity and print rounded result
D = 0.5 * sum(abs(
  ____ / ____ - ____ / ____))

print("Dissimilarity (Georgia):", round(D, 3))    
Edit and Run Code