Replacing rating with group median
In the last exercise, you replaced the missing values in the rating
column with the column median. But could you do better? Yes! You can replace the missing values with the median rating of chocolates from the same company. Let's do it!
There is a predefined replace_missing()
function that takes two arguments - a DataFrame group
and a column col
. It tries to compute a median of the column col
and returns it if it is successful. If calculating the median fails, for example, because there are no values, then it returns a predefined value.
The chocolates
dataset and the DataFrames
and Statistics
packages have been loaded for you.
Diese Übung ist Teil des Kurses
Data Manipulation in Julia
Anleitung zur Übung
- Group
chocolates
bycompany
and iterate over the GroupedDataFrame. - Subset each group using
ismissing()
and therating
column, replacing the missing values by the value ofreplace_missing()
function.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Group by company and iterate
for group in ____(____)
# Subset each group using ismissing() and the rating column, assign a new value
group[____, ____] .= replace_missing(group, :rating)
end
println(describe(chocolates, :nmissing))