Get startedGet started for free

Another iterator method: .itertuples()

1. Another iterator method: .itertuples()

In the previous lesson, we covered how to iterate over a pandas DataFrame row by row using the dot-iterrows method. pandas also comes with a similar iteration method called dot-itertuples that is often more efficient that dot-iterrows. Let's continue using our baseball dataset to compare these two methods.

2. Team wins data

Suppose we have a pandas DataFrame called team_wins_df that contains each team's total wins in a season.

3. Iterating with .iterrows()

If we use dot-iterrows to loop over our team_wins_df DataFrame and print each row's tuple, we see that each row's values are stored as a pandas Series. Remember, dot-iterrows returns each DataFrame row as a tuple of (index, pandas Series) pairs, so we have to access the row's values with square bracket indexing.

4. Iterating with .itertuples()

But, we could use dot-itertuples to loop over our DataFrame rows instead. The dot-itertuples method returns each DataFrame row as a special data type called a namedtuple. A namedtuple is one of the specialized data types that exist within the collections module we've discussed previously. These data types behave just like a Python tuple but have fields accessible using attribute lookup. What does this mean? Notice in the output that each printed row_namedtuple has an Index attribute and each column in our team_wins_df as an attribute. That means we can access each of these attributes with a lookup using a dot method. Here, we can print the last row_namedtuple's Index using row_namedtuple-dot-Index. We can print this row_namedtuple's Team with row_namedtuple-dot-Team, Year with row_namedtuple-dot-Year and so on.

5. Comparing methods

When we compare dot-iterrows to dot-itertuples, we see that there is quite a bit of improvement! The reason dot-itertuples is more efficient than dot-iterrows is due to the way each method stores its output. Since dot-iterrows returns each row's values as a pandas Series, there is a bit more overhead.

6. Attribute lookup caveat

One more quick note about the differences between these methods. When using dot-iterrows, we can use square brackets to reference a column within our team_wins_df DataFrame. Here, we are printing the Team column for each row in our DataFrame. If we use the same syntax with dot-itertuples, we get a TypeError. This is due to the fact that namedtuples don't support square brackets like a pandas Series does. When looking up an attribute within a namedtuple, we must use a dot to reference the attribute. So anytime we use dot-itertuples we have to use a dot when referring to a column within our DataFrame. If we replace our square bracket notation with a dot, we see that the Teams are correctly printed out.

7. Let's keep iterating!

Now, let's put our new skill to the test and practice efficiently looping over rows of a DataFrame using dot-itertuples.