ComeçarComece de graça

Verifying Data Load

Let's suppose each month you get a new file. You know to expect a certain number of records and columns. In this exercise we will create a function that will validate the file loaded.

Este exercício faz parte do curso

Feature Engineering with PySpark

Ver curso

Instruções do exercício

  • Create a data validation function check_load() with parameters df a dataframe, num_records as the number of records and num_columns the number of columns.
  • Using num_records create a check to see if the input dataframe df has the same amount with count().
  • Compare input number of columns the input dataframe has withnum_columns by using len() on columns.
  • If both of these return True, then print Validation Passed

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

def ____(____, ____, ____):
  # Takes a dataframe and compares record and column counts to input
  # Message to return if the critera below aren't met
  message = 'Validation Failed'
  # Check number of records
  if num_records == df.____():
    # Check number of columns
    if num_columns == ____(df.____):
      # Success message
      message = ____
  return message

# Print the data validation message
print(check_load(df, 5000, 74))
Editar e executar o código