Get startedGet started for free

Vectorization over pandas series

1. Vectorization over Pandas series

Welcome back! In this part of the course we will discuss another efficient method to loop through a pandas DataFrame and apply specific functions: the vectorization over pandas Series.

2. DataFrames as arrays

To understand how we can reduce the amount of iteration performed by the function, recall that the fundamental units of Pandas, DataFrames and Series, are both based on arrays. Pandas performs more efficiently when an operation is performed to a whole array than to each value separately or sequentially. Vectorization is the process of executing operations on entire arrays.

3. How to perform pandas vectorization

We will show how to perform pandas vectorization using the poker dataset. Again, we want to calculate the sum of the ranks of all the cards in each hand. In order to do that, we slice the poker dataset keeping only the columns that contain the ranks of each card, as we showed in previous lessons. Then, we call the built-in .sum() property of the DataFrame, using the parameter axis = 1 to denote that we want the sum for each row. In the end, we print the sum of the first five rows of the data.

4. Comparison to the previous methods

In the previous lessons, we saw various methods that perform function application to a DataFrame faster than simply iterating through all the rows of DataFrame. Our goal is to find the most efficient method to perform this task. Comparing the time it takes to sum the ranks of all the cards in each hand using vectorization, the .iterrows() function and the .apply() function, we can see that the vectorization method performs a lot better.

5. Let's do it!

But don't take our word for it, test vectorization's performance yourself!