Measuring Segregation: The Index of Dissimilarity

1. Measuring Segregation: The Index of Dissimilarity

Now that we have seen a variety of Census topics and geographies, let's investigate an important demographic topic: segregation.

2. What is Segregation?

Segregation is the geographic separation of subpopulations. While often synonymous with "racial segregation", we may also be interested in segregation by occupational status, housing tenure, presence of children, or any other characteristic. This image is a dot density map of demographic groups in Chicago. Based on visual inspection we might say that Chicago is segregated. But is it a lot or a little? That's where quantification comes in.

3. Index of Dissimilarity Formula

Let's begin with two groups A and B, that are counted in small areas throughout a region.

4. Index of Dissimilarity Formula

a_i & b_i are the group populations of each small area geography.

5. Index of Dissimilarity Formula

Big A & B are each group's large area population.

6. Index of Dissimilarity Formula

The Index of Dissimilarity, represented by capital D, is a widely used measure of segregation between two groups.

7. Index of Dissimilarity Formula

For each subarea, the count of each group is divided by that group's total. For example, a_1 is group A's population in subarea 1, in the top left of the diagram. This value, 10, is divided by Big A, 100, to yield .1.

8. Index of Dissimilarity Formula

b_1, 200, is divided by Big B, 400, to yield .5.

9. Index of Dissimilarity Formula

The absolute value of the difference is .4.

10. Index of Dissimilarity Formula

This is done for each of the four subareas, then summed,

11. Index of Dissimilarity Formula

then divided by two. The final result varies from 0 to 1, with higher values indicating greater segregation.

12. Index of Dissimilarity Formula

A & B can represent races, or any other count of a categorical variable. The index has even been used in political science to represent unequal legislative representation, with A being the number of legislative seats and B being the population in each legislative district. The Index of Dissimilarity can only accommodate two groups. As American society becomes more multiethnic, other measures such as entropy indexes are being employed. The Index of Dissimilarity is also only appropriate for categorical data. For continuous data such as household income, a measure such as the Theil Inequality Index is appropriate.

13. Suitable Data

In order to calculate the Index of Dissimilarity, you need population counts for two groups. We will work with a DataFrame of Black and White population by Census tract. The DataFrame includes state and county identifiers. Using this you could calculate segregation either by county, or by state.

14. Calculating the Index of Dissimilarity (D)

Let's calculate D for one state, California, with FIPS code "06". Start by filtering the tracts in California to create DataFrame ca_tracts, using the postal code "CA" for California. We will use the column names frequently, so define short variables w & b to hold these names.

15. Calculating the Index of Dissimilarity (D)

We find total White and Black population of California using the sum method. We *could* also construct the API call to request the state population directly, but instead we calculate it from the data at hand.

16. Calculating the Index of Dissimilarity (D)

Finally, apply the formula. Notice how everything in the code corresponds to the formula. a_i and b_i in the formula become ca_tracts[w] and ca_tracts[b] in the code, the white and black population of each tract. These are each divided by big A & B, the White and Black population of California, which is the sum of all the tract populations. These generate two pandas series, which are subtracted from each other yielding another series, which is fed into the absolute value function abs. The sum function is applied to this series. That sum is multiplied by 0.5. The result is approximately 0.6 on a scale of 0 to 1.

17. Let's Practice!

In the exercises, you will use a loop to calculate D for all states. Then you will learn about pandas groupby method. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.