What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXES
SALESCLOSEPRICE
DAYSONMARKET
LISTPRICE
This exercise is part of the course
Feature Engineering with PySpark
Exercise instructions
- From the listed columns above, identify which one we will use as our dependent variable
$Y$
. - Using the loaded data set
df
, filter it down to our dependent variable withselect()
. Store this dataframe in the variableY_df
. - Display summary statistics for the dependent variable using
describe()
onY_df
and callingshow()
to display it.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()