What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXESSALESCLOSEPRICEDAYSONMARKETLISTPRICE
Diese Übung ist Teil des Kurses
<Kurs>Feature Engineering with PySpark</Kurs>Übungsanweisungen
- From the listed columns above, identify which one we will use as our dependent variable
$Y$. - Using the loaded data set
df, filter it down to our dependent variable withselect(). Store this dataframe in the variableY_df. - Display summary statistics for the dependent variable using
describe()onY_dfand callingshow()to display it.
Interaktive praktische Übung
Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()