Applying a UDF to vector data
A dataframe is available called df
having a column output
of type vector
. Its first five rows are shown in the console.
A UDF get_first_udf
is available that selects the first element of a vector column.
This exercise is part of the course
Introduction to Spark SQL in Python
Exercise instructions
- Create a new dataframe called
df_new
by adding a new column todf
. Call the new columnlabel
. - Show the first five rows of
df_new
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Add label by applying the get_first_udf to output column
df_new = df.____('____', ____('____'))
# Show the first five rows
df_new.____