Get startedGet started for free

Looping using the .iterrows() function

1. Looping using the .iterrows() function

Welcome to the third chapter! You now know how to optimize your rows and columns selections, and your value replacements. It's time to learn how to iterate through a pandas DataFrame in the most efficient fashion. The method we will introduce in this lesson is the .iterrows() function.

2. The poker dataset

We will use the poker dataset, that you already encountered in the first chapter. Let's refresh your memory. Each row symbolizes a player's hand in a poker game, which consists of five cards. In each column, we have the Symbol and the Rank for each of the five cards, symbolized by S and R, and the number of the respective card. For example, in the first hand, we have a 10 of diamonds, a Jack and a King of clubs, a 4 of spades and an Ace of hearts.

3. Generators in Python

Before we talk about how to use the .iterrows() function, let's refresh the notion of a generator function. Generators are a simple tool to create iterators. Inside the body of a generator, instead of return statements, you will find only yield() statements. There can be just one, or several yield() statements. Here, we can see a generator, city_name_generator(), that produces four city names. We assign the generator to the variable city_names for simplicity.

4. Generators in Python

In order to access the elements that a generator yields, we use Python's next() function. Each time the next() command is used, the generator will produce the next value to yield, until there are no more values to yield. In our example, we can call next() once, so we can print 'New York' to our screen, then a second time to print 'London' and so on until we yield 'Sao Paolo'. Then, if we attempt to produce the next argument, an error message appears as expected.

5. Looping using the .iterrows() function

The .iterrows() function is a property of every pandas DataFrame. When called, it produces a list with two elements. We will use this generator to iterate through each line of our poker DataFrame. The first element is the index of the row, while the second element contains a pandas Series of each feature of the row: the Symbol and the Rank for each of the five cards. It is very similar to the notion of the enumerate() function, which when applied to a list, returns each element along with its index.

6. Using the .iterrows() function

The most intuitive way to iterate through a Pandas DataFrame is to use the range() function, which is often called crude looping. One smarter way to iterate through a pandas DataFrame is to use the .iterrows() function, which is optimized for this task. We simply define the 'for' loop with two iterators, one for the number of each row and the other for all the values. Inside the loop, the next() command indicates that the loop moves to the next value of the iterator, without actually doing something. We can also notice that the use of .iterrows() does not improve the speed of iterating through a pandas DataFrame. It is very useful though when we need a cleaner way to use the values of each row while iterating through the dataset.

7. Let's do it!

We just learned a method having the advantage of presenting clean code, and the disadvantage of not being very time efficient, to iterate through a pandas DataFrame. Why not try this method then? Let's do it!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.