Verifying DataTypes
In the age of data we have access to more attributes than we ever had before. To handle all of them we will build a lot of automation but at a minimum requires that their datatypes be correct. In this exercise we will validate a dictionary of attributes and their datatypes to see if they are correct. This dictionary is stored in the variable validation_dict
and is available in your workspace.
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Using
df
create a list of attribute and datatype tuples withdtypes
calledactual_dtypes_list
. - Iterate through
actual_dtypes_list
, checking if the column names exist in the dictionary of expected dtypesvalidation_dict
. - For the keys that exist in the dictionary, check their dtypes and print those that match.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# create list of actual dtypes to check
actual_dtypes_list = df.____
print(actual_dtypes_list)
# Iterate through the list of actual dtypes tuples
for attribute_tuple in ____:
# Check if column name is dictionary of expected dtypes
col_name = attribute_tuple[____]
if col_name in ____:
# Compare attribute types
col_type = attribute_tuple[____]
if col_type == validation_dict[____]:
print(col_name + ' has expected dtype.')