Deeper Features

In previous exercises we showed how combining two features together can create good additional features for a predictive model. In this exercise, you will generate 'deeper' features by combining the effects of three variables into one. Then you will check to see if deeper and more complicated features always make for better predictors.

Cet exercice fait partie du cours

Feature Engineering with PySpark

Afficher le cours

Instructions

Create a new feature by adding SQFTBELOWGROUND and SQFTABOVEGROUND and creating a new column Total_SQFT
Using Total_SQFT, create yet another feature called BATHS_PER_1000SQFT with BATHSTOTAL. Be sure to scale Total_SQFT to 1000's
Use describe() to inspect the new min, max and mean of our newest feature BATHS_PER_1000SQFT. Notice anything strange?
Create two jointplots()s with Total_SQFT and BATHS_PER_1000SQFT as the \(x\) values and SALESCLOSEPRICE as the \(y\) value to see which has the better R**2 fit. Does this more complicated feature have a stronger relationship with SALESCLOSEPRICE?

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create new feature by adding two features together
df = df.____(____, df[____] + df[____])

# Create additional new feature using previously created feature
df = df.____(____, df[____] / (df[____] / ____))
df[[____]].____().show()

# Sample and create pandas dataframe
pandas_df = df.sample(False, 0.5, 0).toPandas()

# Linear model plots
sns.jointplot(x=____, y=____, data=pandas_df, kind="reg", stat_func=r2)
plt.show()
sns.jointplot(x=____, y=____, data=pandas_df, kind="reg", stat_func=r2)
plt.show()

Modifier et exécuter le code