LoslegenKostenlos loslegen

Creating a UDF for vector data

A dataframe df is available, having a column output of type vector. Its first five rows are shown in the console.

Diese Übung ist Teil des Kurses

Introduction to Spark SQL in Python

Kurs anzeigen

Anleitung zur Übung

  • Create a UDF called first_udf. It selects the first element of a vector column. Set the result to a default value of 0.0 for any item that is not a vector containing at least one item and cast the output as a float.
  • Use the select operation on df to apply first_udf to the output column.

Interaktive Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Selects the first element of a vector column
first_udf = ____(lambda x:
            ____(x.indices[0]) 
            if (x and hasattr(x, "toArray") and x.____())
            else 0.0,
            FloatType())

# Apply first_udf to the output column
df.select(____("output").alias("result")).show(5)
Code bearbeiten und ausführen