1. How to use the .apply() method on a DataFrame?
Good work on NumPy arrays! Let's move to DataFrames! We'll cover one of the most frequently used methods, .apply().
2. Dataset
First, let's pick a dataset. We'll work with data on 100 students and their performance on different subjects. Each performance score varies between 0 and 100.
3. Default .apply()
Let's use the .apply() method.
It requires one argument - a function that, by default, is applied on each column of a DataFrame. However, the output of .apply() may differ.
For example, applying the sqrt() function results in a DataFrame with square roots of original values.
4. Default .apply()
However, using the mean() function returns a Series. Why?
5. Default .apply()
The columns we apply the function to are passed as pandas Series.
When we use sqrt(), we simply modify each value in a column and return an object of the same size.
When we use mean(), we summarize the Series with a single value.
6. Default .apply(): own functions
For example, let's define a function halving our scores.
We get a modified DataFrame because passing columns to our defined function results in an object of the same size.
7. Default .apply(): own functions
On the contrary, if we return only one value - for example, a perfect score - we summarize each column by a single value.
Therefore, we get pandas Series.
8. Lambda expressions
Of course, our functions can be substituted with lambda expressions!
9. Lambda expressions
It will simplify our code with no changes in our output.
10. Additional arguments: axis
Let's have a look at additional arguments we can pass to the .apply() method.
We'll start with the axis argument.
11. Additional arguments: axis
which can be either 0, which is default,
12. Additional arguments: axis
or 1.
13. Additional arguments: axis
0 means that the function is applied over the columns of a DataFrame,
1 - over the rows.
Specifying this argument is useful for functions resulting in a single value like mean().
14. Additional arguments: axis
Zero implies no difference from the default behavior: we get the mean of each column.
15. Additional arguments: axis
1 implies averaging values in each row instead.
16. Additional arguments: result_type
The next argument we'll discuss is
result_type. We'll consider only some of the values it can take.
The first one is expand.
To understand it, let's define a function that returns a list with the minimum and the maximum value of the input.
When we apply the function to the DataFrame, we get a pandas Series with the corresponding summary for each column. Notice that the list returned by the span() function is considered as a single value summarizing our input, despite the fact that its size is 2. Therefore, the .apply() method results in a pandas Series.
17. Additional arguments: result_type
Specifying the keyword argument unwraps our list resulting in the following DataFrame.
18. Additional arguments: result_type
Adding the axis argument and setting it to 1 applies the span() function row-wise and unfolds the list for each row.
19. Additional arguments: result_type
The second useful value for result_type is broadcast.
To understand it, let's consider applying the mean() function again.
20. Additional arguments: result_type
Specifying broadcasting results in a DataFrame of the original size where each column is filled with the corresponding output from the mean() function.
21. More than one argument in a function
So far, our functions we used .apply() with had only one argument.
22. More than one argument in a function
But what if we have more arguments including keyword arguments?
For example, let's have a function that by default checks if the calculated mean is within a certain interval. If the value of the keyword argument changes to False, then we check an opposite scenario.
23. Applying the function
Let's use .apply() with our function.
We get TypeError because we didn't specify its arguments!
24. Additional arguments: args
They can be specified in the args argument of the .apply() method. It's a list containing positional arguments for our function.
Let's try it now. It works! Notice, the values in the list should have the same order as the function arguments. We didn't specify the 'inside' keyword argument, so the function executes with its default value. What if we want to pass another value?
25. Additional arguments: args
We can simply insert it afterwards. As expected, setting it to False produces an inverted result.
26. Let's practice!
We covered quite a lot on pandas' .apply() method. Let's practice now!