Replacing missing values
Another way of handling missing values is to replace them all with the same value. For numerical variables, one option is to replace values with 0— you'll do this here. However, when you replace missing values, you make assumptions about what a missing value means. In this case, you will assume that a missing number sold means that no sales for that avocado type were made that week.
In this exercise, you'll see how replacing missing values can affect the distribution of a variable using histograms. You can plot histograms for multiple variables at a time as follows:
dogs[["height_cm", "weight_kg"]].hist()
pandas
has been imported as pd
and matplotlib.pyplot
has been imported as plt
. The avocados_2016
dataset is available.
This exercise is part of the course
Data Manipulation with pandas
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# List the columns with missing values
cols_with_missing = ["small_sold", "large_sold", "xl_sold"]
# Create histograms showing the distributions cols_with_missing
avocados_2016[____].____
# Show the plot
____