Custom Percentage Scaling

In the slides we showed how to scale the data between 0 and 1. Sometimes you may wish to scale things differently for modeling or display purposes.

Este exercício faz parte do curso

Feature Engineering with PySpark

Ver curso

Instruções do exercício

Calculate the max and min of DAYSONMARKET and put them into variables max_days and min_days, don't forget to use collect() on agg().
Using withColumn() create a new column called 'percentagescaleddays' based on DAYSONMARKET.
percentage_scaled_days should be a column of integers ranging from 0 to 100, use round() to get integers.
Print the max() and min() for the new column percentage_scaled_days.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Define max and min values and collect them
max_days = df.____({____: ____}).____()[0][0]
min_days = df.____({____: ____}).____()[0][0]

# Create a new column based off the scaled data
df = df.____(____, 
                  ____((df[____] - min_days) / (max_days - min_days)) * ____)

# Calc max and min for new column
print(df.____({____: ____}).____())
print(df.____({____: ____}).____())

Editar e executar o código