Integers in PySpark UDFs
This exercise covers UDFs, allowing you to understand function creation in PySpark! As you work through this exercise, think about what this would replace in a data cleaning workflow.
Remember, there's already a SparkSession
called spark
in your workspace!
This exercise is part of the course
Introduction to PySpark
Exercise instructions
- Register the function
age_category
as a UDF calledage_category_udf
. - Add a new column to the DataFrame
df
called"category"
that applies the UDF to categorize people based on their age. The argument forage_category_udf()
is provided for you. - Show the resulting DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Register the function age_category as a UDF
age_category_udf = ____(age_category, StringType())
# Apply your udf to the DataFrame
age_category_df_2 = age_category_df.withColumn("category", ____(age_category_df["age"]))
# Show df
age_category_df_2.____