Get Started

Semi joins

Semi joins are the opposite of anti joins: an anti-anti join, if you like.

A semi join returns the rows of the first table where it can find a match in the second table. The principle is shown in this diagram.

A semi join, explained using table of colors.

The syntax is the same as for other join types; simply swap the other join function for semi_join()

semi_join(a_tibble, another_tibble, by = c("id_col1", "id_col2"))

You may have spotted that the results of a semi join plus the results of an anti join give the orignial table. So, regardless of the table contents or how you join them, semi_join(A, B) plus anti_join(A, B) will return A (though maybe with the rows in a different order).

This is a part of the course

“Introduction to Spark with sparklyr in R”

View Course

Exercise instructions

A Spark connection has been created for you as spark_conn. Tibbles attached to the track metadata and artist terms stored in Spark have been pre-defined as track_metadata_tbl and artist_terms_tbl respectively.

  • Use a semi join to join the artist terms to the track metadata by the artist_id column. Assign the result to joined.
  • Use sdf_dim() to determine how many rows and columns there are in the joined table.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# track_metadata_tbl and artist_terms_tbl have been pre-defined
track_metadata_tbl
artist_terms_tbl

# Semi join artist terms to track metadata by artist_id
joined <- ___

# How many rows and columns are in the joined table?
___
Edit and Run Code