Interpreting Results
It is almost always important to know which features are influencing your prediction the most. Perhaps its counterintuitive and that's an insight? Perhaps a hand full of features account for most of the accuracy of your model and you don't need to perform time acquiring or massaging other features.
In this example we will be looking at a model that has been trained without any LISTPRICE information. With that gone, what influences the price the most?
- NOTE: The array of feature importances, importanceshas already been created for you frommodel.featureImportances.toArray()
Cet exercice fait partie du cours
Feature Engineering with PySpark
Instructions
- Create a pandasdataframe using the values ofimportancesand name the columnimportanceby setting the parametercolumns.
- Using the imported list of features names, feature_cols, create a newpandas.Seriesby wrapping it in thepd.Series()function. Set it to the columnfi_df['feature'].
- Sort the dataframe using sort_values(), setting thebyparameter to ourimportancecolumn and sort it descending by settingascendingtoFalse. Inspect the results.
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Convert feature importances to a pandas column
fi_df = pd.DataFrame(____, columns=[____])
# Convert list of feature names to pandas column
fi_df['feature'] = pd.____(____)
# Sort the data based on feature importance
fi_df.____(by=[____], ascending=____, inplace=True)
# Inspect Results
fi_df.head(10)