Verifying DataTypes

In the age of data we have access to more attributes than we ever had before. To handle all of them we will build a lot of automation but at a minimum requires that their datatypes be correct. In this exercise we will validate a dictionary of attributes and their datatypes to see if they are correct. This dictionary is stored in the variable validation_dict and is available in your workspace.

Deze oefening maakt deel uit van de cursus

Feature Engineering with PySpark

Cursus bekijken

Oefeninstructies

Using df create a list of attribute and datatype tuples with dtypes called actual_dtypes_list.
Iterate through actual_dtypes_list, checking if the column names exist in the dictionary of expected dtypes validation_dict.
For the keys that exist in the dictionary, check their dtypes and print those that match.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# create list of actual dtypes to check
actual_dtypes_list = df.____
print(actual_dtypes_list)

# Iterate through the list of actual dtypes tuples
for attribute_tuple in ____:
  
  # Check if column name is dictionary of expected dtypes
  col_name = attribute_tuple[____]
  if col_name in ____:

    # Compare attribute types
    col_type = attribute_tuple[____]
    if col_type == validation_dict[____]:
      print(col_name + ' has expected dtype.')

Code bewerken en uitvoeren