Popcorn double feature
The dplyr methods that you saw in the previous two chapters use Spark's SQL interface. That is, they convert your R code into SQL code before passing it to Spark. This is an excellent solution for basic data manipulation, but it runs into problems when you want to do more complicated processing. For example, you can calculate the mean of a column, but not the median. Here is the example from the 'Summarizing columns' exercise that you completed in Chapter 1.
track_metadata_tbl %>%
summarize(mean_duration = mean(duration)) #OK
track_metadata_tbl %>%
summarize(median_duration = median(duration))
sparklyr also has two "native" interfaces that will be discussed in the next two chapters. Native means that they call Java or Scala code to access Spark libraries directly, without any conversion to SQL. sparklyr supports the Spark DataFrame Application Programming Interface (API), with functions that have an sdf_ prefix. It also supports access to Spark's machine learning library, MLlib, with "feature transformation" functions that begin ft_, and "machine learning" functions that begin ml_.
One important philosophical difference between working with R and working with Spark is that Spark is much stricter about variable types than R. Most of the native functions want DoubleType inputs and return DoubleType outputs. DoubleType is Spark's equivalent of R's numeric vector type. sparklyr will handle converting numeric to DoubleType, but it is up to the user (that's you!) to convert logical or integer data into numeric data and back again.
Which of these statements is true?
sparklyr'sdplyrmethods convert code into Scala code before running it on Spark.- Converting R code into SQL code limits the number of supported computations.
- Most Spark MLlib modeling functions require
DoubleTypeinputs and returnDoubleTypeoutputs. - Most Spark MLlib modeling functions require
IntegerTypeinputs and returnBooleanTypeoutputs.
This exercise is part of the course
Introduction to Spark with sparklyr in R
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
Start Exercise