Exploring Spark data types
You've already seen (back in Chapter 1) src_tbls() for listing the DataFrames on Spark that sparklyr can see. You've also seen glimpse() for exploring the columns of a tibble on the R side.
sparklyr has a function named sdf_schema() for exploring the columns of a tibble on the R side. It's easy to call; and a little painful to deal with the return value.
sdf_schema(a_tibble)
The return value is a list, and each element is a list with two elements, containing the name and data type of each column. The exercise shows a data transformation to more easily view the data types.
Here is a comparison of how R data types map to Spark data types. Other data types are not currently supported by sparklyr.
| R type | Spark type |
|---|---|
| logical | BooleanType |
| numeric | DoubleType |
| integer | IntegerType |
| character | StringType |
| list | ArrayType |
Diese Übung ist Teil des Kurses
Introduction to Spark with sparklyr in R
Anleitung zur Übung
A Spark connection has been created for you as spark_conn. A tibble attached to the track metadata stored in Spark has been pre-defined as track_metadata_tbl.
- Call
sdf_schema()to get the schema of the track metadata. - Run the transformation code on
schemato see it in a more readable tibble format.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# track_metadata_tbl has been pre-defined
track_metadata_tbl
# Get the schema
(schema <- ___(___))
# Transform the schema
schema %>%
lapply(function(x) do.call(data_frame, x)) %>%
bind_rows()