LoslegenKostenlos loslegen

Calculating D Using Grouping in Pandas

Performing a calculation over subsets of a DataFrame is so common that pandas gives us an alternative to doing it in a loop, the groupby method. In the sample code, groupby is used first to group tracts by state, i.e. those rows having the same value in the "state" column. The sum() method is applied by group to the columns.

This exercise also makes use of merge, another useful pandas method, to join the grouped sums to the individual tracts. Don't worry about the syntax for now. merge will be explained in a later lesson.

pandas has been imported using the usual alias, and the tracts DataFrame with population columns white and black has been loaded. The variables w and b have been defined with the column names "white" and "black".

Diese Übung ist Teil des Kurses

Analyzing US Census Data in Python

Kurs anzeigen

Anleitung zur Übung

  • Create sums_by_state using groupby and print the result.
  • Create tracts using merge and print the result.
  • Calculate \(\left\lvert\frac{a_i}{A} - \frac{b_i}{B}\right\rvert\) and store it in a new column D. (Reminder: The sum of White and Black populations (\(A\) and \(B\)) was already calculated and is available in the tracts DataFrame in the columns suffixed with "_sum").
  • Sum the column D by state using the groupby method, and multiply by 0.5.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Sum Black and White residents grouped by state
sums_by_state = tracts.groupby("state")[[w, b]].sum()
print(sums_by_state.head())

# Merge the sum with the original tract populations
tracts = pd.merge(tracts, sums_by_state, left_on = "state", 
    right_index = True, suffixes = ("", "_sum"))
print(tracts.head())

# Calculate inner expression of Index of Dissimilarity formula
tracts["D"] = abs(tracts[____] / tracts[____ + "_sum"] - ____ / ____)

# Calculate the Index of Dissimilarity
print(0.5 * tracts.____(____)["D"].sum())
Code bearbeiten und ausführen