Aan de slagGa gratis aan de slag

Selecting unique rows

If you have a categorical variable stored in a factor, it is often useful to know what the individual categories are; you do this with the levels() function. For a tibble, the more general concept is to find rows with unique data. Following the terminology from SQL, this is done using the distinct() function. You can use it directly on your dataset, so you find unique combinations of a particular set of columns. For example, to find the unique combinations of values in the x, y, and z columns, you would write the following.

a_tibble %>%
  distinct(x, y, z)

Deze oefening maakt deel uit van de cursus

Introduction to Spark with sparklyr in R

Cursus bekijken

Oefeninstructies

A Spark connection has been created for you as spark_conn. A tibble attached to the track metadata stored in Spark has been pre-defined as track_metadata_tbl.

  • Find the distinct values of the artist_name column from track_metadata_tbl.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# track_metadata_tbl has been pre-defined
track_metadata_tbl

track_metadata_tbl %>%
  # Only return rows with distinct artist_name
  ___
Code bewerken en uitvoeren