Classification: feature engineering
1. Classification: feature engineering
In the last 2 lessons you learned how to select features and regularize your dataset for regression models. In this video, we'll go over how to engineer features to build a classifier. Shall we get started?2. Feature engineering...why?
Some of the main reasons to perform feature engineering are because engineered features extract additional information from the data, creates additional relevant features, and it is one of the most effective ways to improve predictive models.3. Benefits of feature engineering
Feature engineering leads to increased predictive power of the learning algorithm that you're using and as a result makes your machine learning models perform even better!4. Types of feature engineering
There are several types of feature engineering. They are indicator variables, interaction features, and feature representation. We'll briefly cover examples of each.5. Indicator variables
An example of a threshold indicator is when you use a feature such as age to distinguish when a value is above or below a given threshold like high school vs college, multiple features can be used as a flag to indicate 2 bed 2 bath properties if you have the domain knowledge that that combination is considered premium, special events such as black friday or christmas, and groups of classes can be used to create a paid flag for website traffic sources such as Google adwords or Facebook ads.6. Interaction features
Interaction features are created by using two or more features and then taking their sum, difference, product, quotient or any other mathematical combination or formula since combined features may predict better than separately.7. Feature representation
And, finally feature representation is where you take, for example, a datetime stamp and extract the day of week or hour of day, grouping categorical levels with small numbers of observations together as a single level called 'Other', and the transformation of categorical into so-called dummy variables which is also commonly called one-hot-encoding where k number of classes minus 1 are transformed into binary columns.8. Different categorical levels
There is a small caveat I'd like to make here that you need to keep in mind. And that is you need to be aware of the possibility of classes that exist in the training data but not in the test data and vice versa. There are ways around this that are beyond the scope of this course. If you'd like to take a deeper dive into how to handle this scenario, here is a link to an excellent article on just that.9. Debt to income ratio
In the exercises you'll engineer an interaction feature called debt to income ratio using the original features monthly debt and annual income from the loan data, the latter of which you'll divide by 12 so that it becomes monthly income.10. Feature engineering functions
You'll use a few new functions in the exercises as well as ones you've seen before but are here as a review. Logistic regression gives an estimator, as always, and you know what train/test split does by now. The countplot function from seaborn as sns returns a bar plot that counts the number of observations in each level given as the x equals argument. The dot drop function on a dataframe drops features passed to it as a list and axis equals 1 indicates to drop it along the column axis. You'll use the dot replace function and then pass a dictionary to convert the values for the levels in the target variable loan status to integers, 0 when fully paid and 1 if charged off. To avoid it being separated into multiple columns, it is important to handle a categorical target variable first and separately before you call get_dummies, which for a given categorical feature returns the number of classes of the feature minus 1 as new binary features. Lastly, accuracy underscore score from sklearn.metrics returns the accuracy when y_test is passed along with the predicted x_test.11. An excellent tutorial:
Another tutorial, that covers additional ways to handle categorical data than we've had time to cover, is linked here as well.12. Let's practice!
Time to try your hand at feature engineering!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.