Custom Percentage Scaling
In the slides we showed how to scale the data between 0 and 1. Sometimes you may wish to scale things differently for modeling or display purposes.
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Calculate the max and min of DAYSONMARKETand put them into variablesmax_daysandmin_days, don't forget to usecollect()onagg().
- Using withColumn()create a new column called 'percentagescaleddays' based onDAYSONMARKET.
- percentage_scaled_daysshould be a column of integers ranging from 0 to 100, use- round()to get integers.
- Print the max()andmin()for the new columnpercentage_scaled_days.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Define max and min values and collect them
max_days = df.____({____: ____}).____()[0][0]
min_days = df.____({____: ____}).____()[0][0]
# Create a new column based off the scaled data
df = df.____(____, 
                  ____((df[____] - min_days) / (max_days - min_days)) * ____)
# Calc max and min for new column
print(df.____({____: ____}).____())
print(df.____({____: ____}).____())