What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXESSALESCLOSEPRICEDAYSONMARKETLISTPRICE
Questo esercizio fa parte del corso
Feature Engineering with PySpark
Istruzioni dell'esercizio
- From the listed columns above, identify which one we will use as our dependent variable
$Y$. - Using the loaded data set
df, filter it down to our dependent variable withselect(). Store this dataframe in the variableY_df. - Display summary statistics for the dependent variable using
describe()onY_dfand callingshow()to display it.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()