Hands-on data merging in KNIME

1. Hands-on data merging in KNIME

In this screencast, we'll apply three data merging techniques: concatenation, lookups, and joins. Let's start with concatenation. Our sales regions have sent us three different datasets - one with data from the north region, one from the south, and one from the central region. Our CFO surely does not want to open individual files. To combine these datasets, we use the Concatenate Node. Add the Concatenate Node to the workflow, add an additional input, and connect all three datasets to it. When you execute the node, KNIME will stack the datasets, keeping all rows from the tables in the same structure. Since the three datasets share the same columns, this is straightforward. However, if the datasets had differing columns, KNIME would automatically insert missing values for any columns present in one table but not in the other. Concatenation is ideal for our use case, where we want to extend our dataset by adding new rows, such as appending data from multiple regions. Next, let's investigate value lookups. Our CFO wants speaking reports; however, the sales data only has product IDs. We want to enrich it by adding product names. This is where the Value Lookup Node comes in. Our sales data contains the "Product" column holding the ID. We also have a separate table with the product names tied to the "Product ID."" To link these two datasets, add the Value Lookup Node to the workflow. Connect the sales data to one input and the product information to the other. In the configuration window, specify "product_id" as the key column to search for in the data table's lookup column "product". Ensure only 'product_name' is listed under "Includes". Once executed, KNIME will bring the product names into the sales data where there is a matching product ID. This is particularly useful since our transactional table requires additional reference data from the dictionary table, such as the name. Finally, we'll perform a more complex merge using the Joiner Node. We want to promote certain products in certain regions, so our CFO has added additional commissions to sales. We now want to merge our transaction data with the commission bonus data. These datasets have different structures but share common keys, like region and product name. Add the Joiner Node to your workflow and connect the two datasets. In the configuration, select first "sales_region" and match it with "region". The next criterion is "product" to be matched with "product_name". For this demonstration, we'll keep the Right Unmatched Rows to preserve all transaction data, even if we don't grant extra commissions. From the top, we only want to include the "commission_bonus" column for the output. Execute the node, and KNIME will produce a merged dataset with all the sales data enriched by bonus commissions wherever we have a match. For all others, we have missing values. And that is how you merge data in KNIME efficiently. We've covered how to concatenate datasets, enrich data with lookups, and use the Joiner Node to merge with multiple criteria. These are essential techniques for data manipulation. They ensure you can work with different datasets and maximize productivity.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.