Interpreting Results
It is almost always important to know which features are influencing your prediction the most. Perhaps its counterintuitive and that's an insight? Perhaps a hand full of features account for most of the accuracy of your model and you don't need to perform time acquiring or massaging other features.
In this example we will be looking at a model that has been trained without any LISTPRICE
information. With that gone, what influences the price the most?
- NOTE: The array of feature importances,
importances
has already been created for you frommodel.featureImportances.toArray()
Diese Übung ist Teil des Kurses
Feature Engineering with PySpark
Anleitung zur Übung
- Create a
pandas
dataframe using the values ofimportances
and name the columnimportance
by setting the parametercolumns
. - Using the imported list of features names,
feature_cols
, create a newpandas.Series
by wrapping it in thepd.Series()
function. Set it to the columnfi_df['feature']
. - Sort the dataframe using
sort_values()
, setting theby
parameter to ourimportance
column and sort it descending by settingascending
toFalse
. Inspect the results.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Convert feature importances to a pandas column
fi_df = pd.DataFrame(____, columns=[____])
# Convert list of feature names to pandas column
fi_df['feature'] = pd.____(____)
# Sort the data based on feature importance
fi_df.____(by=[____], ascending=____, inplace=True)
# Inspect Results
fi_df.head(10)