Outlier detection

A very important aspect of preprocessing your data correctly is outlier detection. In machine learning interview questions, a common question is how to locate and process outliers. An easy way to detect outliers is by visualizing them graphically.

After finding and imputing missing data, finding and figuring out what to do about outliers is another necessary preprocessing step.

There's a variety of packages that let you visualize outliers, but in this exercise, you will be using seaborn to plot univariate and multivariate boxplots of selected columns of loan_data.

All relevant packages have been imported for you.

Where are you with the pipeline?

Machine learning pipeline

1
- Create a univariate boxplot using the feature Annual Income from loan_data.
- Create a multivariate boxplot conditioned on Loan Status using the feature Annual Income from loan_data.

2
- Create a univariate boxplot using the feature Monthly Debt from loan_data.
- Create a multivariate boxplot conditioned on Loan Status using the feature Monthly Debt from loan_data.
3
- Create a univariate boxplot using the feature Years of Credit History from loan_data.
- Create a multivariate boxplot conditioned on Loan Status using the feature Years of Credit History from loan_data.

Data Pre-processing and Visualization

Supervised Learning

Unsupervised Learning

Model Selection and Evaluation

Ejercicio

Outlier detection

Instrucciones 1/3