Feature engineering
1. Feature engineering
Feature engineering is the process of using domain knowledge to create new variables which help you discover more insights. It is one of the crucial steps when building predictive models.2. Feature engineering
The variables available in the dataset are called as basic variables. Any variable created through the transformation of basic variables are called as derived variables. In other terms, the process of deriving new variables is called feature engineering.3. Creating new features
In this lesson, you will derive three new features, namely, age difference, job-hop index, and employee tenure.4. Age difference
Different generations of workforce have different views towards managing and delivering work and handling pressure. New generation managers need to learn to motivate and manage the pool of older workers. Employees reporting to younger managers generally think their managers are overconfident, while younger managers often complaint that older employees can't deal with the rapid change of pace in the work environment. So to determine whether age difference really matters, you will create this new variable in the following exercises.5. Job-hopping
A job hopper is a person who switches jobs frequently for financial or career advancement opportunities. Recruiters and hiring managers generally perceive job hoppers in a negative light as there is a good chance they will quit in the near future. You can derive the job-hop index by dividing an employee's total work experience by number of companies worked.6. Employee tenure
Tenure is the duration of employment or time spent in the organization. You can derive the tenure for Inactive employees by using the date of joining and last working date while for Active employees, tenure can be derived using the date of joining & cut off date. cutoff_date is the study period end date. In this dataset, the cut-off date is December 31, 2014.7. Deriving employee tenure
If the relevant date columns in your dataset are represented as strings, you can convert them to dates using the functions from the lubridate package. Here, we convert the date_of_joining, cutoff_date, and last_working_date columns to dates using the dmy() function since the dates in our dataset are in the form of DD-MM-YYYY.8. Calculating timespan
Now, to compute the tenure in number of years, you can use the interval() and time_length() functions, again from lubridate. The time_length() function requires two arguments, the first is a duration, period or interval and second argument is unit, a character string that specifies which time units to use, such as days, weeks or years. The interval() function creates an interval object with the specified start and end dates. If the start date occurs before the end date, the interval will be positive, else it will be negative. The second argument "years", tells the time_length() function that we want the difference in dates in terms of years. In the given example, the time span between the two dates is 14.6 years. You will calculate employee tenure using this approach in the following exercises.9. Let's practice!
It's time for you to do some feature engineering!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.