IniziaInizia gratis

Integers in PySpark UDFs

This exercise covers UDFs, allowing you to understand function creation in PySpark! As you work through this exercise, think about what this would replace in a data cleaning workflow.

Remember, there's already a SparkSession called spark in your workspace!

Questo esercizio fa parte del corso

Introduction to PySpark

Visualizza il corso

Istruzioni dell'esercizio

  • Register the function age_category as a UDF called age_category_udf.
  • Add a new column to the DataFrame df called "category" that applies the UDF to categorize people based on their age. The argument for age_category_udf() is provided for you.
  • Show the resulting DataFrame.

Esercizio pratico interattivo

Prova a risolvere questo esercizio completando il codice di esempio.

# Register the function age_category as a UDF
age_category_udf = ____(age_category, StringType())

# Apply your udf to the DataFrame
age_category_df_2 = age_category_df.withColumn("category", ____(age_category_df["age"]))

# Show df
age_category_df_2.____
Modifica ed esegui il codice