1. Handling missing data
In this video, we'll learn how to handle missing data when we stack or unstack DataFrames.
2. Review
So far, we have learned to reshape DataFrames by stacking all columns' levels. By unstacking one row index level. Or by choosing which level or levels we want to stack or unstack.
3. Unstacking leads to missing values
We have also seen that those operations can lead to new missing values. In the case of the unstacking process, this happens when subgroups do not have the same set of labels. Let's see an example. We have the following DataFrame containing data about animals.
4. Unstacking leads to missing values
For example, we have the subgroup Mammalia Carnivora. But the subgroup Aves Carnivora is not present in the DataFrame.
5. Unstacking leads to missing values
We now unstack the level "class" in the animals DataFrame.
6. Unstacking leads to missing values
In the reshaped data, we can see that the subgroup Aves Carnivora shows a missing value. This happens because, as we said, it was not present in the original DataFrame.
7. Handling NaN with unstack
Luckily, the parameter fill_value of the unstack() method allows us to fill those values with any value.
8. Handling NaN with unstack
In our case, we set the fill_value to the word No.
9. Handling NaN with unstack
Additionally, we sort the index by order, ascending, and name, descending. We can see in the output that the missing values were replaced by the word no.
10. Stack and missing values
The case of the stack() method is a little different. Missing values appear when the combination of index and column values are missing from the original DataFrame. Let's work with the following DataFrame. It contains data about flowers.
11. Stack and missing values
We apply the stack() method on the DataFrame. We can see that the combination of rose and size is completely missing.
12. Stack and missing values
This happens because stack() has the argument dropna set to True by default. This drops all rows that have only missing values.
13. Stack and missing values
If for some reason we want to keep that information, we need to set the dropna argument to False. We can see in the resulting DataFrame that the row with indices rose size is now present. All its values are missing values.
14. Handling NaN with stack
We could then fill the missing values using the method fillna(). We pass the value with which we want to replace the missing values. And the resulting DataFrame will have zeros instead of NaNs.
15. Let's practice!
Now, you now how to handle missing values. Let's stack and unstack DataFrames.