Get startedGet started for free

Integers in PySpark UDFs

This exercise covers UDFs, allowing you to understand function creation in PySpark! As you work through this exercise, think about what this would replace in a data cleaning workflow.

Remember, there's already a SparkSession called spark in your workspace!

This exercise is part of the course

Introduction to PySpark

View Course

Exercise instructions

  • Register the function age_category as a UDF called age_category_udf.
  • Add a new column to the DataFrame df called "category" that applies the UDF to categorize people based on their age. The argument for age_category_udf() is provided for you.
  • Show the resulting DataFrame.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Register the function age_category as a UDF
age_category_udf = ____(age_category, StringType())

# Apply your udf to the DataFrame
age_category_df_2 = age_category_df.withColumn("category", ____(age_category_df["age"]))

# Show df
age_category_df_2.____
Edit and Run Code