The connect-work-disconnect pattern
Working with sparklyr is very much like working with dplyr when you have data inside a database. In fact, sparklyr converts your R code into SQL code before passing it to Spark.
The typical workflow has three steps:
- Connect to Spark using
spark_connect(). - Do some work.
- Close the connection to Spark using
spark_disconnect().
In this exercise, you'll do this simplest possible piece of work: returning the version of Spark that is running, using spark_version().
spark_connect() takes a URL that gives the location to Spark. For a local cluster (as you are running), the URL should be "local". For a remote cluster (on another machine, typically a high-performance server), the connection string will be a URL and port to connect on.
spark_version() and spark_disconnect() both take the Spark connection as their only argument.
One word of warning. Connecting to a cluster takes several seconds, so it is impractical to regularly connect and disconnect. While you need to reconnect for each DataCamp exercise, when you incorporate sparklyr into your own workflow, it is usually best to keep the connection open for the whole time that you want to work with Spark.
Questo esercizio fa parte del corso
Introduction to Spark with sparklyr in R
Istruzioni dell'esercizio
- Load the
sparklyrpackage withlibrary(). - Connect to Spark by calling
spark_connect(), with argumentmaster = "local". Assign the result tospark_conn. - Get the Spark version using
spark_version(), with argumentsc = spark_conn. - Disconnect from Spark using
spark_disconnect(), with argumentsc = spark_conn.
Esercizio pratico interattivo
Prova a risolvere questo esercizio completando il codice di esempio.
# Load sparklyr
___
# Connect to your Spark cluster
spark_conn <- ___
# Print the version of Spark
___
# Disconnect from Spark
___