Get startedGet started for free

Engineering numerical features

1. Engineering numerical features

Though we may have a dataset filled with numerical features, they may need a little bit of feature engineering to properly prepare for modeling. In this section, we'll talk about aggregate statistics as well as dates and how engineering numerical features can add value to our model's performance.

2. Aggregate statistics

If we have, a collection of features related to a single feature, like temperatures on different days, we may want to take an average or median to use as a feature for modeling instead. A common method of feature engineering is to take an aggregate of a set of numbers to use in place of those features. This can be helpful in reducing the dimensionality of our feature space, or perhaps we simply don't need multiple similar values that are close in distance to each other. In this dataset of temperatures over the course of three days in four different cities. Rather than using all three days, let's take an average of the three. First, we can subset the columns we want to aggregate over using dot-loc. Then, we set axis=1 to calculate the mean for each row, and save the results in the mean column.

3. Dates

Dates and timestamps are another area where we might want to reduce granularity in our dataset. If we're doing time series analysis, we will likely need to keep this granularity to capture underlying trends on different timescales, but if we're running a prediction task, we may need higher-level information like the month, year, or both. Here's a collection of purchase dates. The full date is too granular for the prediction task we want to do, so let's extract the month from each date.

4. Dates

The first thing to do is to convert this date column into a pandas datetime column using the pd-dot-datetime function. This makes extracting the components much easier. Once it's converted, we can use the dt-dot-month attribute to extract out the month. There are a lots of other attributes for extracting different components, like day and year, and I encourage you to try these out yourself. We can see that there is now a column of month values ready for modeling.

5. Let's practice!

Time to put engineering numerical features into practice!