Exercise

Semi joins

Semi joins are the opposite of anti joins: an anti-anti join, if you like.

A semi join returns the rows of the first table where it can find a match in the second table. The principle is shown in this diagram.

A semi join, explained using table of colors.

The syntax is the same as for other join types; simply swap the other join function for semi_join()

semi_join(a_tibble, another_tibble, by = c("id_col1", "id_col2"))

You may have spotted that the results of a semi join plus the results of an anti join give the orignial table. So, regardless of the table contents or how you join them, semi_join(A, B) plus anti_join(A, B) will return A (though maybe with the rows in a different order).

Instructions

100 XP

A Spark connection has been created for you as spark_conn. Tibbles attached to the track metadata and artist terms stored in Spark have been pre-defined as track_metadata_tbl and artist_terms_tbl respectively.

  • Use a semi join to join the artist terms to the track metadata by the artist_id column. Assign the result to joined.
  • Use dim() to determine how many rows and columns there are in the joined table.