Verifying Data Load
Let's suppose each month you get a new file. You know to expect a certain number of records and columns. In this exercise we will create a function that will validate the file loaded.
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Create a data validation function
check_load()with parametersdfa dataframe,num_recordsas the number of records andnum_columnsthe number of columns. - Using
num_recordscreate a check to see if the input dataframedfhas the same amount withcount(). - Compare input number of columns the input dataframe has with
num_columnsby usinglen()oncolumns. - If both of these return
True, then printValidation Passed
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
def ____(____, ____, ____):
# Takes a dataframe and compares record and column counts to input
# Message to return if the critera below aren't met
message = 'Validation Failed'
# Check number of records
if num_records == df.____():
# Check number of columns
if num_columns == ____(df.____):
# Success message
message = ____
return message
# Print the data validation message
print(check_load(df, 5000, 74))