1. Engineering new features
Now that we've taken care of some of the more straightforward preprocessing tasks, it's time to engineer new features.
2. UFO feature engineering
There are several fields in the UFO dataset that are great candidates for feature engineering. From the date field, we may want to know the month of the sighting. The number of minutes needs to be extracted from the length of time field. And finally, the description field contains a text description of the sighting. It would be interesting to vectorize that text and see what we can learn from it. Some important code to remember for date extraction is to use attributes like dt-dot-month and hour to get the pieces of the date you need. Regular expressions will help you extract numbers from text, and you can use the group method to return the results. And finally, scikit-learn and TfidfVectorizer will vectorize text fields.
3. Let's practice!
Let's get to work!