Get startedGet started for free

Custom Percentage Scaling

In the slides we showed how to scale the data between 0 and 1. Sometimes you may wish to scale things differently for modeling or display purposes.

This exercise is part of the course

Feature Engineering with PySpark

View Course

Exercise instructions

  • Calculate the max and min of DAYSONMARKET and put them into variables max_days and min_days, don't forget to use collect() on agg().
  • Using withColumn() create a new column called 'percentagescaleddays' based on DAYSONMARKET.
  • percentage_scaled_days should be a column of integers ranging from 0 to 100, use round() to get integers.
  • Print the max() and min() for the new column percentage_scaled_days.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define max and min values and collect them
max_days = df.____({____: ____}).____()[0][0]
min_days = df.____({____: ____}).____()[0][0]

# Create a new column based off the scaled data
df = df.____(____, 
                  ____((df[____] - min_days) / (max_days - min_days)) * ____)

# Calc max and min for new column
print(df.____({____: ____}).____())
print(df.____({____: ____}).____())
Edit and Run Code