What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
- TAXES
- SALESCLOSEPRICE
- DAYSONMARKET
- LISTPRICE
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- From the listed columns above, identify which one we will use as our dependent variable $Y$.
- Using the loaded data set df, filter it down to our dependent variable withselect(). Store this dataframe in the variableY_df.
- Display summary statistics for the dependent variable using describe()onY_dfand callingshow()to display it.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()