What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXES
SALESCLOSEPRICE
DAYSONMARKET
LISTPRICE
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- From the listed columns above, identify which one we will use as our dependent variable
$Y$
. - Using the loaded data set
df
, filter it down to our dependent variable withselect()
. Store this dataframe in the variableY_df
. - Display summary statistics for the dependent variable using
describe()
onY_df
and callingshow()
to display it.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()