1. Segregation Impacts: Unemployment
We've seen how to calculate a measure of segregation, but why is that of interest? We study segregation because we are concerned about its impacts. In this Lesson we will explore the relationship between segregation and African-American unemployment.
2. Deciphering ACS Subject Table IDs
We'll be using data from the 2012 ACS. The ACS table names follow this pattern. Let's see where these names come from.
3. [B|C]ssnnn[A-I]
First the tables begin with B or C, for Base or Collapsed. Base tables contain the most detailed breakdowns. Collapsed tables are combined into fewer, larger "buckets".
In this example educational attainment is available at finer distinctions (more "buckets") for the whole population. But sometimes, for small geographies or for subpopulations, the table is only available in the collapsed version. Tables that end in the letters A through I are "racial iterations".
4. [B|C]ssnnn[A-I]
That indicates that the table universe is only the population of a given race or ethnicity. In this lesson, we will work with the B tables, with data on African-American employment and unemployment.
5. [B|C]ssnnn[A-I]
The five digits in the center represent a two digit subject identifier and a three digit serial. This is a partial list of subjects in the ACS. Age and sex tables have breakdowns such as male 21-24 years old, or female 40-49 years old. Educational attainment will be broken down by years of schooling and diplomas or degrees earned. Income tables include data such median income or per capita income by area, but also the count of households in income categories such as $10,000 - $14,999.
The full list of identifiers is available at this link.
6. Comparing Segregation Impacts
In this lesson, we'll compare segregation impacts by sex and race. We've already done scatterplots. Now we will plot unemployment against segregation "conditioned" on a third variable. This means that an additional visual feature, such as color in this plot, is used to indicate the value of an additional variable, such as sex.
7. Tidy Data
In order to do so, we will have to convert our DataFrame from a "wide" format, where variable names are column names, ...
to a "tidy" format, where the variable name distinguishes rows.
Tidy vs. wide just represent different ways of storing the same data. Neither is "correct", but some functions require the data in a particular way. To do that, we will use pandas melt method.
To begin, if we want the values "male" and "female" to appear in the rows instead of "male_lf" and "female_lf", we have to rename the columns. Set msa_labor_force.columns to ["msa", "male", "female"].
8. pandas.melt
Now call melt.
The id_vars parameter is set to a list of columns that will identify the entity, in this case the "msa" column. In a DataFrame with a multicolumn identifier, such as "state" and "county", the id_vars list will contain multiple column names.
value_vars gets set to the names of the columns that hold the values stored for each entity, in this case "male" and "female".
var_name is a new name chosen for the column that will hold the variable names. Since this column will hold the names "male" and "female", we name it "sex".
Finally, value_name is a new name chosen for the column that will hold the values. In this case, the values represent labor force counts. We name it "labor_force".
9. pandas.melt
The result, which we've already seen, is shown here. It still has three columns, but has twice as many rows. One column, "sex", is a categorical variable, another, "labor_force", contains the values previously stored in two different columns.
10. Let's Practice
Let's put these new skills to good use!