ComenzarEmpieza gratis

Using Corr()

The old adage 'Correlation does not imply Causation' is a cautionary tale. However, correlation does give us a good nudge to know where to start looking promising features to use in our models. Use this exercise to get a feel for searching through your data for the first time, trying to find patterns.

A list called columns containing column names has been created for you. In this exercise you will compute the correlation between those columns and 'SALESCLOSEPRICE', and find the maximum.

Este ejercicio forma parte del curso

Feature Engineering with PySpark

Ver curso

Instrucciones del ejercicio

  • Use a for loop iterate through the columns.
  • In each loop cycle, compute the correlation between the current column and 'SALESCLOSEPRICE' using the corr() method.
  • Create logic to update the maximum observed correlation and with which column.
  • Print out the name of the column that has the maximum correlation with 'SALESCLOSEPRICE'.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Name and value of col with max corr
corr_max = 0
corr_max_col = columns[0]

# Loop to check all columns contained in list
for ____ in ____:
    # Check the correlation of a pair of columns
    corr_val = df.____(____, ____)
    # Logic to compare corr_max with current corr_val
    if ____ ____ ____:
        # Update the column name and corr value
        corr_max = corr_val
        corr_max_col = col

print(corr_max_col)
Editar y ejecutar código