Creating a UDF for vector data
A dataframe df
is available, having a column output
of type vector
. Its first five rows are shown in the console.
Diese Übung ist Teil des Kurses
Introduction to Spark SQL in Python
Anleitung zur Übung
- Create a UDF called
first_udf
. It selects the first element of a vector column. Set the result to a default value of 0.0 for any item that is not a vector containing at least one item and cast the output as a float. - Use the
select
operation ondf
to applyfirst_udf
to theoutput
column.
Interaktive Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Selects the first element of a vector column
first_udf = ____(lambda x:
____(x.indices[0])
if (x and hasattr(x, "toArray") and x.____())
else 0.0,
FloatType())
# Apply first_udf to the output column
df.select(____("output").alias("result")).show(5)