Set operations
1. Set operations
In this lesson, you will learn how to use set operations to compare and combine the rows of two data tables that have the same columns.2. Set operation functions
This group of three functions will enable you to identify rows that are duplicated across two data tables, those unique to either data table, and concatenate two data tables keeping only the unique rows in both.3. Set operations: `fintersect()`
The fintersect() function takes two data tables that have the same columns as its inputs and returns a new data table containing the set of rows that can be found in both data tables. You can see in the example running fintersect() on these two data tables will return just the highlighted rows.4. `fintersect()` and duplicate rows
By default only one copy of each row is returned, even if there are multiple copies in each data table.5. `fintersect()` and duplicate rows
You can set all equals TRUE to keep all pairs of matching duplicates. In the example, two copies of the yellow lion row can be found in both dt1 and dt2, so there are two copies in the result. The extra copy in dt1 is ignored, because it doesn't have another copy it can match to in dt2.6. Set operations: `fsetdiff()`
The fsetdiff() function takes two data tables that have the same columns as its inputs, and returns a new data table containing the set of rows that are only found in the data table supplied as its first argument. In the example here, you can see fsetdiff() returns the rows from dt1 highlighted in blue. These are the rows that are unique to dt1.7. `fsetdiff()` and duplicates
When there are duplicate rows, only one copy of each row is returned in the result by default.8. `fsetdiff()` and duplicates
When you set all equals TRUE, any copies unique to the first data table are also returned. Here, not only do we keep both copies of the "antelope" row highlighted in purple, but one of the "lion" rows highlighted in yellow is also included in the result. This is because there are three copies of this row in dt1, and only two in dt2, so the extra copy in dt1 is returned because it does not have a matching pair in dt2.9. Set operations: `funion()`
The funion() function takes two data tables that have the same columns as its inputs and returns a new data table containing the set of all unique rows found in either data table.10. `funion()` and duplicates
By default, duplicate rows are ignored. The result will only contain one copy of each unique row.11. `funion()` and duplicates
Setting all equals TRUE will keep all copies of each row. The result is equivalent to using the rbind function to concatenate the two data tables.12. Removing duplicates when combining many `data.tables`
The funion() function is a useful way to concatenate two data tables while removing duplicate rows. When working with more than two data tables, you can concatenate them all using the rbind() or rbindlist() functions, and then use the duplicated() and unique() functions to identify or remove the duplicate rows.13. Let's practice!
Now, its your turn to practice set operations.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.