Integers in PySpark UDFs
This exercise covers UDFs, allowing you to understand function creation in PySpark! As you work through this exercise, think about what this would replace in a data cleaning workflow.
Remember, there's already a SparkSession called spark in your workspace!
Este exercício faz parte do curso
Introduction to PySpark
Instruções do exercício
- Register the function
age_categoryas a UDF calledage_category_udf. - Add a new column to the DataFrame
dfcalled"category"that applies the UDF to categorize people based on their age. The argument forage_category_udf()is provided for you. - Show the resulting DataFrame.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Register the function age_category as a UDF
age_category_udf = ____(age_category, StringType())
# Apply your udf to the DataFrame
age_category_df_2 = age_category_df.withColumn("category", ____(age_category_df["age"]))
# Show df
age_category_df_2.____