Calculating D Using Grouping in Pandas
Performing a calculation over subsets of a DataFrame is so common that pandas
gives us an alternative to doing it in a loop, the groupby
method. In the sample code, groupby
is used first to group tracts by state, i.e. those rows having the same value in the "state"
column. The sum()
method is applied by group to the columns.
This exercise also makes use of merge
, another useful pandas
method, to join the grouped sums to the individual tracts. Don't worry about the syntax for now. merge
will be explained in a later lesson.
pandas
has been imported using the usual alias, and the tracts
DataFrame with population columns white
and black
has been loaded. The variables w
and b
have been defined with the column names "white"
and "black"
.
Este exercício faz parte do curso
Analyzing US Census Data in Python
Instruções do exercício
- Create
sums_by_state
usinggroupby
and print the result. - Create
tracts
usingmerge
and print the result. - Calculate \(\left\lvert\frac{a_i}{A} - \frac{b_i}{B}\right\rvert\) and store it in a new column
D
. (Reminder: The sum of White and Black populations (\(A\) and \(B\)) was already calculated and is available in thetracts
DataFrame in the columns suffixed with"_sum"
). - Sum the column
D
by state using thegroupby
method, and multiply by0.5
.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Sum Black and White residents grouped by state
sums_by_state = tracts.groupby("state")[[w, b]].sum()
print(sums_by_state.head())
# Merge the sum with the original tract populations
tracts = pd.merge(tracts, sums_by_state, left_on = "state",
right_index = True, suffixes = ("", "_sum"))
print(tracts.head())
# Calculate inner expression of Index of Dissimilarity formula
tracts["D"] = abs(tracts[____] / tracts[____ + "_sum"] - ____ / ____)
# Calculate the Index of Dissimilarity
print(0.5 * tracts.____(____)["D"].sum())