What are we predicting?
Which of these fields (or columns) is the value we are trying to predict for?
TAXESSALESCLOSEPRICEDAYSONMARKETLISTPRICE
Bu egzersiz, kursun bir parçasıdır
Feature Engineering with PySpark
Egzersiz talimatları
- From the listed columns above, identify which one we will use as our dependent variable
$Y$. - Using the loaded data set
df, filter it down to our dependent variable withselect(). Store this dataframe in the variableY_df. - Display summary statistics for the dependent variable using
describe()onY_dfand callingshow()to display it.
Uygulamalı etkileşimli egzersiz
Bu egzersizi bu örnek kodu tamamlayarak deneyin.
# Select our dependent variable
Y_df = df.____([____])
# Display summary statistics
Y_df.____().____()