BaşlayınÜcretsiz Başlayın

Using user defined functions in Spark

You've seen some of the power behind Spark's built-in string functions when it comes to manipulating DataFrames. However, once you reach a certain point, it becomes difficult to process the data in a without creating a rat's nest of function calls. Here's one place where you can use User Defined Functions to manipulate our DataFrames.

For this exercise, we'll use our voter_df DataFrame, but you're going to replace the first_name column with the first and middle names.

The pyspark.sql.functions library is available under the alias F. The classes from pyspark.sql.types are already imported.

Bu egzersiz

Cleaning Data with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Edit the getFirstAndMiddle() function to return a space separated string of names, except the last entry in the names list.
  • Define the function as a user-defined function. It should return a string type.
  • Create a new column on voter_df called first_and_middle_name using your UDF.
  • Show the Data Frame.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

def getFirstAndMiddle(names):
  # Return a space separated string of names
  return ' '.join(____)

# Define the method as a UDF
udfFirstAndMiddle = F.____(____, ____)

# Create a new column using your UDF
voter_df = voter_df.withColumn('first_and_middle_name', ____(____))

# Show the DataFrame
____
Kodu Düzenle ve Çalıştır