Crunching data in KNIME Analytics Platform

1. Crunching data in KNIME Analytics Platform

Let’s now finally get some statistics and insight from the data in KNIME Analytics Platform. Before we dig into the calculation, note the information of our analysis is scattered among different tables. We have information about the employees in this table and the respective income in the database table. Let’s bring it all together. The most straightforward way of achieving this is by using the Value Lookup node. This node has two inputs, one for the data table and one for the dictionary table from which you want to take the values from. This node takes one column of the data table and, for every row, looks for a match in the dictionary table. Once a match is found, it appends the values to the data table. Let’s configure it to retrieve the information about the income of every employee and append it to the employee table. In the two tables, the column that matches is the ID column. Thus, select ID both for the lookup and key column. The value that you want to include is the MonthlyIncome. Therefore retain only this column from the dictionary table. Execute the node and see that now you have, for every employee, the respective monthly income. If a value is not present in the dictionary table, you will get a missing value. Now that you have all the data in one table, it’s time to calculate some basic statistics. For example: what is the mean income in the company? And what is the mean income by department? All those questions can be answered using the Row Aggregator node. This node lets you aggregate rows by selecting a category column, an aggregation method and one or more aggregation columns. As a first simple example, calculate the mean income among all the employees. You want the global statistic, so select None as category column. The aggregation that you want to perform is “average” and you will perform it on the MonthlyIncome column. The result is the average of the income of all the employees. What about the average income by department? In that case, you need to select the Department column as the category column. The node will provide in output the average income for each unique value of the column Department. If you select Occurrence count aggregation, instead, you will not select any aggregation columns, since it will just give you the total number of times that each value of the category column, in this case the department, is present in the table. Now that you have created a more complex workflow, let’s add some comments and annotation to make sure that it is properly documented. Under every node, you can add a comment to describe the operation that the node is performing, or the data that it is reading and so on. Double click “Add comment” and start typing. In addition, you can create annotations to add more documentation to the workflow. Change the formatting, add lists, links and color it via the editor that appears. Finally, you can wrap nodes inside a metanode to keep the whole workflow tidier. That’s it, go on and try it yourself!

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.