1. Learn
  2. /
  3. Courses
  4. /
  5. Introduction to Python & Machine Learning (with Analytics Vidhya Hackathons)

Exercise

Selecting important variables for model building

One of the benefits of Random forest is the power of handle large data set with higher dimensionality. It can handle thousands of input variables and identify most significant variables so it is considered as one of the dimensionality reduction methods. Further, the model outputs the importance of the variables, which can be a very handy feature.


featimp = pd.Series(model.feature_importances_, index=predictors).sort_values(ascending=False)

print (featimp)

I have selected all the features available in the train data set and model it using random forest:

predictors=['ApplicantIncome', 'CoapplicantIncome', 'Credit_History','Dependents', 'Education', 'Gender', 'LoanAmount',
            'Loan_Amount_Term', 'Married', 'Property_Area', 'Self_Employed', 'TotalIncome','Log_TotalIncome']


Run feature importance command and identify Which variable has the highest impact on the model??

Instructions

50 XP

Possible answers