Verifying Data Load
Let's suppose each month you get a new file. You know to expect a certain number of records and columns. In this exercise we will create a function that will validate the file loaded.
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Create a data validation function check_load()with parametersdfa dataframe,num_recordsas the number of records andnum_columnsthe number of columns.
- Using num_recordscreate a check to see if the input dataframedfhas the same amount withcount().
- Compare input number of columns the input dataframe has withnum_columnsby usinglen()oncolumns.
- If both of these return True, then printValidation Passed
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
def ____(____, ____, ____):
  # Takes a dataframe and compares record and column counts to input
  # Message to return if the critera below aren't met
  message = 'Validation Failed'
  # Check number of records
  if num_records == df.____():
    # Check number of columns
    if num_columns == ____(df.____):
      # Success message
      message = ____
  return message
# Print the data validation message
print(check_load(df, 5000, 74))