Get startedGet started for free

HR data architecture

1. Talent segments

Segmentation refers to a process of dividing a population into various subgroups which have similar characteristics.

2. Identifying the talent segments

Without segmentation, your target group is a diverse population and it would be very difficult to obtain relevant and meaningful insights. Talent segmentation helps in designing customized HR interventions.

3. Identifying the talent segments

Generally, an organization categorizes their employees into top management, middle management, and entry-level. Top and middle management work profiles are strategic and tactical, which require different skill set and hence face different labor market conditions as compared to entry level employees. Including them in the analysis may influence your insights, and thus, it is recommended to exclude these profiles for analysis. In this course, we will focus on entry-level employees, i.e., Analyst and Specialist level roles who form a majority of the workforce.

4. Filtering the dataset

In order to include employees at specific levels, you will use the filter() function from the dplyr package which helps subset a dataset based on certain conditions. Here we use the filter() function to return all rows where the level is Analyst or Specialist.

5. HR data sources across employee life cycle (ELC)

Employee data resides in various HR data sources. For example, talent acquisition data can be pulled from Taleo or ADP, while engagement or exit survey data can be sourced from SurveyMonkey etc. It's time to bring more information captured across the employee life cycle to build one dataset comprising of maximum available information about each employee. By bringing relevant data together you can derive more insights.

6. HR data architecture

Here is a sample representation of HR data architecture. In an ideal scenario, various data sources are pulled into a data warehouse, which is then connected to the visualization and analytics layer. However, not all organizations have this workflow setup, so you might be required to source and collate the data manually to prepare a master data table.

7. Merge datasets using left_join()

You can use joins to combine various datasets. To show you how joins work, let's deal with two sample datasets df1 and df2. df1 contains employee id and level while df2 contains employee id and location. Let's assume we are only interested in employees whose information is available in the dataset df1. You can use the left_join() function from dplyr to combine df1 with df2. The order in which you pass the datasets to left_join() matters. Since we only care about employees in df1, we pass df1 as the first argument, and df2 as the second. "emp_id" is the column by which we want to join both the datasets, and is passed as a string to the "by" argument. As you can see here, only information for employees 1 and 2 (which are present is df1) is available in the result.

8. Let's practice!

Now it's time for you to filter the talent segments and combine the data from different HR data sources.