Get startedGet started for free

How to use the .apply() method on a DataFrame?

1. How to use the .apply() method on a DataFrame?

Good work on NumPy arrays! Let's move to DataFrames! We'll cover one of the most frequently used methods, .apply().

2. Dataset

First, let's pick a dataset. We'll work with data on 100 students and their performance on different subjects. Each performance score varies between 0 and 100.

3. Default .apply()

Let's use the .apply() method. It requires one argument - a function that, by default, is applied on each column of a DataFrame. However, the output of .apply() may differ. For example, applying the sqrt() function results in a DataFrame with square roots of original values.

4. Default .apply()

However, using the mean() function returns a Series. Why?

5. Default .apply()

The columns we apply the function to are passed as pandas Series. When we use sqrt(), we simply modify each value in a column and return an object of the same size. When we use mean(), we summarize the Series with a single value.

6. Default .apply(): own functions

For example, let's define a function halving our scores. We get a modified DataFrame because passing columns to our defined function results in an object of the same size.

7. Default .apply(): own functions

On the contrary, if we return only one value - for example, a perfect score - we summarize each column by a single value. Therefore, we get pandas Series.

8. Lambda expressions

Of course, our functions can be substituted with lambda expressions!

9. Lambda expressions

It will simplify our code with no changes in our output.

10. Additional arguments: axis

Let's have a look at additional arguments we can pass to the .apply() method. We'll start with the axis argument.

11. Additional arguments: axis

which can be either 0, which is default,

12. Additional arguments: axis

or 1.

13. Additional arguments: axis

0 means that the function is applied over the columns of a DataFrame, 1 - over the rows. Specifying this argument is useful for functions resulting in a single value like mean().

14. Additional arguments: axis

Zero implies no difference from the default behavior: we get the mean of each column.

15. Additional arguments: axis

1 implies averaging values in each row instead.

16. Additional arguments: result_type

The next argument we'll discuss is result_type. We'll consider only some of the values it can take. The first one is expand. To understand it, let's define a function that returns a list with the minimum and the maximum value of the input. When we apply the function to the DataFrame, we get a pandas Series with the corresponding summary for each column. Notice that the list returned by the span() function is considered as a single value summarizing our input, despite the fact that its size is 2. Therefore, the .apply() method results in a pandas Series.

17. Additional arguments: result_type

Specifying the keyword argument unwraps our list resulting in the following DataFrame.

18. Additional arguments: result_type

Adding the axis argument and setting it to 1 applies the span() function row-wise and unfolds the list for each row.

19. Additional arguments: result_type

The second useful value for result_type is broadcast. To understand it, let's consider applying the mean() function again.

20. Additional arguments: result_type

Specifying broadcasting results in a DataFrame of the original size where each column is filled with the corresponding output from the mean() function.

21. More than one argument in a function

So far, our functions we used .apply() with had only one argument.

22. More than one argument in a function

But what if we have more arguments including keyword arguments? For example, let's have a function that by default checks if the calculated mean is within a certain interval. If the value of the keyword argument changes to False, then we check an opposite scenario.

23. Applying the function

Let's use .apply() with our function. We get TypeError because we didn't specify its arguments!

24. Additional arguments: args

They can be specified in the args argument of the .apply() method. It's a list containing positional arguments for our function. Let's try it now. It works! Notice, the values in the list should have the same order as the function arguments. We didn't specify the 'inside' keyword argument, so the function executes with its default value. What if we want to pass another value?

25. Additional arguments: args

We can simply insert it afterwards. As expected, setting it to False produces an inverted result.

26. Let's practice!

We covered quite a lot on pandas' .apply() method. Let's practice now!