Popcorn double feature
The dplyr
methods that you saw in the previous two chapters use Spark's SQL interface. That is, they convert your R code into SQL code before passing it to Spark. This is an excellent solution for basic data manipulation, but it runs into problems when you want to do more complicated processing. For example, you can calculate the mean of a column, but not the median. Here is the example from the 'Summarizing columns' exercise that you completed in Chapter 1.
track_metadata_tbl %>%
summarize(mean_duration = mean(duration)) #OK
track_metadata_tbl %>%
summarize(median_duration = median(duration))
sparklyr
also has two "native" interfaces that will be discussed in the next two chapters. Native means that they call Java or Scala code to access Spark libraries directly, without any conversion to SQL. sparklyr
supports the Spark DataFrame Application Programming Interface (API), with functions that have an sdf_
prefix. It also supports access to Spark's machine learning library, MLlib, with "feature transformation" functions that begin ft_
, and "machine learning" functions that begin ml_
.
One important philosophical difference between working with R and working with Spark is that Spark is much stricter about variable types than R. Most of the native functions want DoubleType
inputs and return DoubleType
outputs. DoubleType
is Spark's equivalent of R's numeric
vector type. sparklyr
will handle converting numeric
to DoubleType
, but it is up to the user (that's you!) to convert logical
or integer
data into numeric
data and back again.
Which of these statements is true?
sparklyr
'sdplyr
methods convert code into Scala code before running it on Spark.- Converting R code into SQL code limits the number of supported computations.
- Most Spark MLlib modeling functions require
DoubleType
inputs and returnDoubleType
outputs. - Most Spark MLlib modeling functions require
IntegerType
inputs and returnBooleanType
outputs.
This exercise is part of the course
Introduction to Spark with sparklyr in R
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
