What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXESSALESCLOSEPRICEDAYSONMARKETLISTPRICE
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- From the listed columns above, identify which one we will use as our dependent variable
$Y$. - Using the loaded data set
df, filter it down to our dependent variable withselect(). Store this dataframe in the variableY_df. - Display summary statistics for the dependent variable using
describe()onY_dfand callingshow()to display it.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()