Aan de slagGa gratis aan de slag

Filter out corrupt data

One recurrent step you can expect in the transformation phase would be to clean up some incomplete data. In this exercise, you're going to look at course data, which has the following format:

course_id title description programming_language
1 Some Course r

You're going to inspect this DataFrame and make sure there are no missing values by using the pandas DataFrame's .isnull().sum() methods. You will find that the programming_language column has some missing values.

As such, you will complete the transform_fill_programming_language() function by using the .fillna() method to fill missing values.

Deze oefening maakt deel uit van de cursus

Introduction to Data Engineering

Cursus bekijken

Oefeninstructies

  • Print the number of missing values in course_data.
  • Missing values of the programming_language should be the language "R".
  • Print out the number of missing values per column once more, this time for transformed.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

course_data = extract_course_data(db_engines)

# Print out the number of missing values per column
print(____.____().____())

# The transformation should fill in the missing values
def transform_fill_programming_language(course_data):
    imputed = course_data.____({"programming_language": "____"})
    return imputed

transformed = transform_fill_programming_language(course_data)

# Print out the number of missing values per column of transformed
print(____)
Code bewerken en uitvoeren