Get startedGet started for free

Mix and squeeze data

1. Mix and squeeze data

You have read and cleaned the data. It is now time to introduce some more advanced tools to complete your data analytics toolbox.

2. Data merging

It is common to have data scattered in several tables on databases, Excel files, remote file systems, etc. You know how to access all those data sources, but how to merge them together into the same table?

3. Value lookup

For example, the data of the HR department include table with the personal information of the employees and

4. Value lookup

a table with the salary of each employee. To pair each employee with the respective income, you can make use of the value lookup operation.

5. Value lookup

For both tables, you have an ID column, which is the unique ID of every employer.

6. Value lookup

The lookup operation does the following: for each employer in the first table,

7. Value lookup

it finds a matching ID in the second table

8. Value lookup

and appends the respective salary to the first.

9. Value lookup

It works just like a dictionary: you have a word, look for it in the dictionary, and retrieve its definition.

10. Value lookup

In this case, the word is the employer ID, and the definition is its income. The second table is indeed called dictionary table, and the columns that are matched are called

11. Value lookup

lookup and key column, respectively.

12. Value lookup

Note that sometimes, the lookup does not find a match in the dictionary, and in that case, a missing value is added to the data table.

13. Data aggregation

Once all the data that you need is in the same table, you want to produce a summary and get some insights. This process is called data aggregation.

14. Data aggregation

Imagine it like squeezing your data to get a more concentrated version out of it.

15. Data aggregation

For example, you can aggregate the table vertically and sum up the salary of all the employees.

16. Data aggregation

The salary is the aggregation column

17. Data aggregation

and "sum" is the aggregation operation.

18. Data aggregation

You can also be more granular, and sum up the salary of all the employers of the same department. In this case, the department is called category column.

19. Data aggregation

Change the category column, aggregation operation, and aggregation column to calculate other statistics, such as the number of employees for each department

20. Data aggregation

or the average salary per department.

21. Documentation

One last important tool that cannot miss in the toolbox of every data analyst is a good documentation. It is better to write step by step during development what is going on in your workflow. Like leaving a sign to find the way when you come back the project later in the future. A properly documented workflow is also easier to maintain, edit, and share with other people.

22. Documentation

Visual programming already offers advantages in terms of readability and self documentation, and you can also add comments under each node to describe its operations. However, in a workflow with too many nodes, it might be useful to group some of them to hide the complexity and keep the workflow readable.

23. Metanodes

In KNIME Analytics Platform, you can group nodes into Metanodes, that work like folders for nodes to keep them tidy and organized.

24. Let's practice!

Are you ready to mix and squeeze the data for HR? Let's go and practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.