Interpreting Results
It is almost always important to know which features are influencing your prediction the most. Perhaps its counterintuitive and that's an insight? Perhaps a hand full of features account for most of the accuracy of your model and you don't need to perform time acquiring or massaging other features.
In this example we will be looking at a model that has been trained without any LISTPRICE information. With that gone, what influences the price the most?
- NOTE: The array of feature importances,
importanceshas already been created for you frommodel.featureImportances.toArray()
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Create a
pandasdataframe using the values ofimportancesand name the columnimportanceby setting the parametercolumns. - Using the imported list of features names,
feature_cols, create a newpandas.Seriesby wrapping it in thepd.Series()function. Set it to the columnfi_df['feature']. - Sort the dataframe using
sort_values(), setting thebyparameter to ourimportancecolumn and sort it descending by settingascendingtoFalse. Inspect the results.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Convert feature importances to a pandas column
fi_df = pd.DataFrame(____, columns=[____])
# Convert list of feature names to pandas column
fi_df['feature'] = pd.____(____)
# Sort the data based on feature importance
fi_df.____(by=[____], ascending=____, inplace=True)
# Inspect Results
fi_df.head(10)