Interpreting Results
It is almost always important to know which features are influencing your prediction the most. Perhaps its counterintuitive and that's an insight? Perhaps a hand full of features account for most of the accuracy of your model and you don't need to perform time acquiring or massaging other features.
In this example we will be looking at a model that has been trained without any LISTPRICE
information. With that gone, what influences the price the most?
- NOTE: The array of feature importances,
importances
has already been created for you frommodel.featureImportances.toArray()
This exercise is part of the course
Feature Engineering with PySpark
Exercise instructions
- Create a
pandas
dataframe using the values ofimportances
and name the columnimportance
by setting the parametercolumns
. - Using the imported list of features names,
feature_cols
, create a newpandas.Series
by wrapping it in thepd.Series()
function. Set it to the columnfi_df['feature']
. - Sort the dataframe using
sort_values()
, setting theby
parameter to ourimportance
column and sort it descending by settingascending
toFalse
. Inspect the results.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Convert feature importances to a pandas column
fi_df = pd.DataFrame(____, columns=[____])
# Convert list of feature names to pandas column
fi_df['feature'] = pd.____(____)
# Sort the data based on feature importance
fi_df.____(by=[____], ascending=____, inplace=True)
# Inspect Results
fi_df.head(10)