ComeçarComece de graça

Custom Percentage Scaling

In the slides we showed how to scale the data between 0 and 1. Sometimes you may wish to scale things differently for modeling or display purposes.

Este exercício faz parte do curso

Feature Engineering with PySpark

Ver curso

Instruções do exercício

  • Calculate the max and min of DAYSONMARKET and put them into variables max_days and min_days, don't forget to use collect() on agg().
  • Using withColumn() create a new column called 'percentagescaleddays' based on DAYSONMARKET.
  • percentage_scaled_days should be a column of integers ranging from 0 to 100, use round() to get integers.
  • Print the max() and min() for the new column percentage_scaled_days.

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Define max and min values and collect them
max_days = df.____({____: ____}).____()[0][0]
min_days = df.____({____: ____}).____()[0][0]

# Create a new column based off the scaled data
df = df.____(____, 
                  ____((df[____] - min_days) / (max_days - min_days)) * ____)

# Calc max and min for new column
print(df.____({____: ____}).____())
print(df.____({____: ____}).____())
Editar e executar o código