Semi joins
Semi joins are the opposite of anti joins: an anti-anti join, if you like.
A semi join returns the rows of the first table where it can find a match in the second table. The principle is shown in this diagram.
The syntax is the same as for other join types; simply swap the other join function for semi_join()
semi_join(a_tibble, another_tibble, by = c("id_col1", "id_col2"))
You may have spotted that the results of a semi join plus the results of an anti join give the orignial table. So, regardless of the table contents or how you join them, semi_join(A, B)
plus anti_join(A, B)
will return A
(though maybe with the rows in a different order).
This is a part of the course
“Introduction to Spark with sparklyr in R”
Exercise instructions
A Spark connection has been created for you as spark_conn
. Tibbles attached to the track metadata and artist terms stored in Spark have been pre-defined as track_metadata_tbl
and artist_terms_tbl
respectively.
- Use a semi join to join the artist terms to the track metadata by the
artist_id
column. Assign the result tojoined
. - Use
sdf_dim()
to determine how many rows and columns there are in the joined table.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# track_metadata_tbl and artist_terms_tbl have been pre-defined
track_metadata_tbl
artist_terms_tbl
# Semi join artist terms to track metadata by artist_id
joined <- ___
# How many rows and columns are in the joined table?
___
This exercise is part of the course
Introduction to Spark with sparklyr in R
Learn how to run big data analysis using Spark and the sparklyr package in R, and explore Spark MLIb in just 4 hours.
In which you learn more about using the <code>dplyr</code> interface to Spark, including advanced field selection, calculating groupwise statistics, and joining data frames.
Exercise 1: Leveling upExercise 2: Mother's little helper (1)Exercise 3: Mother's little helper (2)Exercise 4: Selecting unique rowsExercise 5: Common peopleExercise 6: Collecting data back from SparkExercise 7: Storing intermediate resultsExercise 8: Groups: great for music, great for dataExercise 9: Groups of mutantsExercise 10: Advanced Selection II: The SQLExercise 11: Left joinsExercise 12: Anti joinsExercise 13: Semi joinsWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.