Binarizing Day of Week
In a previous video, we saw that it was very unlikely for a home to list on the weekend. Let's create a new field that says if the house is listed for sale on a weekday or not. In this example there is a field called List_Day_of_Week
that has Monday is labeled 1.0 and Sunday is 7.0. Let's convert this to a binary field with weekday being 0 and weekend being 1. We can use the pyspark feature transformer Binarizer
to do this.
This exercise is part of the course
Feature Engineering with PySpark
Exercise instructions
- Import the feature transformer
Binarizer
frompyspark
and theml.feature
module. - Create the transformer using
Binarizer()
with the threshold for setting the value to 1 as anything after Friday, 5.0, then set the input column asList_Day_of_Week
and output column asListed_On_Weekend
. - Apply the binarizer transformation on
df
usingtransform()
. - Verify the transformation worked correctly by selecting the
List_Day_of_Week
andListed_On_Weekend
columns withshow()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import transformer
from pyspark.____.____ import ____
# Create the transformer
binarizer = ____(threshold=____ inputCol=____, outputCol=____)
# Apply the transformation to df
df = binarizer.____(____)
# Verify transformation
df[[____, ____]].____()