Get Started

Semi joins

Semi joins are the opposite of anti joins: an anti-anti join, if you like.

A semi join returns the rows of the first table where it can find a match in the second table. The principle is shown in this diagram.

A semi join, explained using table of colors.

The syntax is the same as for other join types; simply swap the other join function for semi_join()

semi_join(a_tibble, another_tibble, by = c("id_col1", "id_col2"))

You may have spotted that the results of a semi join plus the results of an anti join give the orignial table. So, regardless of the table contents or how you join them, semi_join(A, B) plus anti_join(A, B) will return A (though maybe with the rows in a different order).

This is a part of the course

“Introduction to Spark with sparklyr in R”

View Course

Exercise instructions

A Spark connection has been created for you as spark_conn. Tibbles attached to the track metadata and artist terms stored in Spark have been pre-defined as track_metadata_tbl and artist_terms_tbl respectively.

  • Use a semi join to join the artist terms to the track metadata by the artist_id column. Assign the result to joined.
  • Use sdf_dim() to determine how many rows and columns there are in the joined table.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# track_metadata_tbl and artist_terms_tbl have been pre-defined
track_metadata_tbl
artist_terms_tbl

# Semi join artist terms to track metadata by artist_id
joined <- ___

# How many rows and columns are in the joined table?
___

This exercise is part of the course

Introduction to Spark with sparklyr in R

IntermediateSkill Level
5.0+
4 reviews

Learn how to run big data analysis using Spark and the sparklyr package in R, and explore Spark MLIb in just 4 hours.

In which you learn more about using the <code>dplyr</code> interface to Spark, including advanced field selection, calculating groupwise statistics, and joining data frames.

Exercise 1: Leveling upExercise 2: Mother's little helper (1)Exercise 3: Mother's little helper (2)Exercise 4: Selecting unique rowsExercise 5: Common peopleExercise 6: Collecting data back from SparkExercise 7: Storing intermediate resultsExercise 8: Groups: great for music, great for dataExercise 9: Groups of mutantsExercise 10: Advanced Selection II: The SQLExercise 11: Left joinsExercise 12: Anti joinsExercise 13: Semi joins

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free