Mix and squeeze data
1. Mix and squeeze data
You have read and cleaned the data. It is now time to introduce some more advanced tools to complete your data analytics toolbox.2. Data merging
It is common to have data scattered in several tables on databases, Excel files, remote file systems, etc. You know how to access all those data sources, but how to merge them together into the same table?3. Value lookup
For example, the data of the HR department include table with the personal information of the employees and4. Value lookup
a table with the salary of each employee. To pair each employee with the respective income, you can make use of the value lookup operation.5. Value lookup
For both tables, you have an ID column, which is the unique ID of every employer.6. Value lookup
The lookup operation does the following: for each employer in the first table,7. Value lookup
it finds a matching ID in the second table8. Value lookup
and appends the respective salary to the first.9. Value lookup
It works just like a dictionary: you have a word, look for it in the dictionary, and retrieve its definition.10. Value lookup
In this case, the word is the employer ID, and the definition is its income. The second table is indeed called dictionary table, and the columns that are matched are called11. Value lookup
lookup and key column, respectively.12. Value lookup
Note that sometimes, the lookup does not find a match in the dictionary, and in that case, a missing value is added to the data table.13. Data aggregation
Once all the data that you need is in the same table, you want to produce a summary and get some insights. This process is called data aggregation.14. Data aggregation
Imagine it like squeezing your data to get a more concentrated version out of it.15. Data aggregation
For example, you can aggregate the table vertically and sum up the salary of all the employees.16. Data aggregation
The salary is the aggregation column17. Data aggregation
and "sum" is the aggregation operation.18. Data aggregation
You can also be more granular, and sum up the salary of all the employers of the same department. In this case, the department is called category column.19. Data aggregation
Change the category column, aggregation operation, and aggregation column to calculate other statistics, such as the number of employees for each department20. Data aggregation
or the average salary per department.21. Documentation
One last important tool that cannot miss in the toolbox of every data analyst is a good documentation. It is better to write step by step during development what is going on in your workflow. Like leaving a sign to find the way when you come back the project later in the future. A properly documented workflow is also easier to maintain, edit, and share with other people.22. Documentation
Visual programming already offers advantages in terms of readability and self documentation, and you can also add comments under each node to describe its operations. However, in a workflow with too many nodes, it might be useful to group some of them to hide the complexity and keep the workflow readable.23. Metanodes
In KNIME Analytics Platform, you can group nodes into Metanodes, that work like folders for nodes to keep them tidy and organized.24. Let's practice!
Are you ready to mix and squeeze the data for HR? Let's go and practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.