Get startedGet started for free

The connect-work-disconnect pattern

Working with sparklyr is very much like working with dplyr when you have data inside a database. In fact, sparklyr converts your R code into SQL code before passing it to Spark.

The typical workflow has three steps:

  1. Connect to Spark using spark_connect().
  2. Do some work.
  3. Close the connection to Spark using spark_disconnect().

In this exercise, you'll do this simplest possible piece of work: returning the version of Spark that is running, using spark_version().

spark_connect() takes a URL that gives the location to Spark. For a local cluster (as you are running), the URL should be "local". For a remote cluster (on another machine, typically a high-performance server), the connection string will be a URL and port to connect on.

spark_version() and spark_disconnect() both take the Spark connection as their only argument.

One word of warning. Connecting to a cluster takes several seconds, so it is impractical to regularly connect and disconnect. While you need to reconnect for each DataCamp exercise, when you incorporate sparklyr into your own workflow, it is usually best to keep the connection open for the whole time that you want to work with Spark.

This exercise is part of the course

Introduction to Spark with sparklyr in R

View Course

Exercise instructions

  • Load the sparklyr package with library().
  • Connect to Spark by calling spark_connect(), with argument master = "local". Assign the result to spark_conn.
  • Get the Spark version using spark_version(), with argument sc = spark_conn.
  • Disconnect from Spark using spark_disconnect(), with argument sc = spark_conn.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Load sparklyr
___

# Connect to your Spark cluster
spark_conn <- ___

# Print the version of Spark
___

# Disconnect from Spark
___
Edit and Run Code