What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXES
SALESCLOSEPRICE
DAYSONMARKET
LISTPRICE
Diese Übung ist Teil des Kurses
Feature Engineering with PySpark
Anleitung zur Übung
- From the listed columns above, identify which one we will use as our dependent variable
$Y$
. - Using the loaded data set
df
, filter it down to our dependent variable withselect()
. Store this dataframe in the variableY_df
. - Display summary statistics for the dependent variable using
describe()
onY_df
and callingshow()
to display it.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()