Deeper Features
In previous exercises we showed how combining two features together can create good additional features for a predictive model. In this exercise, you will generate 'deeper' features by combining the effects of three variables into one. Then you will check to see if deeper and more complicated features always make for better predictors.
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Create a new feature by adding
SQFTBELOWGROUNDandSQFTABOVEGROUNDand creating a new columnTotal_SQFT - Using
Total_SQFT, create yet another feature calledBATHS_PER_1000SQFTwithBATHSTOTAL. Be sure to scaleTotal_SQFTto 1000's - Use
describe()to inspect the new min, max and mean of our newest featureBATHS_PER_1000SQFT. Notice anything strange? - Create two
jointplots()s withTotal_SQFTandBATHS_PER_1000SQFTas the \(x\) values andSALESCLOSEPRICEas the \(y\) value to see which has the better R**2 fit. Does this more complicated feature have a stronger relationship withSALESCLOSEPRICE?
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Create new feature by adding two features together
df = df.____(____, df[____] + df[____])
# Create additional new feature using previously created feature
df = df.____(____, df[____] / (df[____] / ____))
df[[____]].____().show()
# Sample and create pandas dataframe
pandas_df = df.sample(False, 0.5, 0).toPandas()
# Linear model plots
sns.jointplot(x=____, y=____, data=pandas_df, kind="reg", stat_func=r2)
plt.show()
sns.jointplot(x=____, y=____, data=pandas_df, kind="reg", stat_func=r2)
plt.show()