1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Spark SQL in Python

Connected

Exercise

Creating a UDF for vector data

A dataframe df is available, having a column output of type vector. Its first five rows are shown in the console.

Instructions

100 XP
  • Create a UDF called first_udf. It selects the first element of a vector column. Set the result to a default value of 0.0 for any item that is not a vector containing at least one item and cast the output as a float.
  • Use the select operation on df to apply first_udf to the output column.