datacamp-logo

Here be dragons

Before you get too excited, a word of warning. Spark is still a very new technology, and some niceties like clear error messages aren't there yet. So when things go wrong, it can be hard to understand why.

sparklyr is newer, and doesn't have a full set of features. There are some things that you just can't do with Spark from R right now. The Scala and Python interfaces to Spark are more mature.

That means that you are sailing into uncharted territory with this course. The trip may be a little rough, so be prepared to be out of your comfort zone occasionally.

One further note of caution is that in this course you'll be running code on your own personal Spark mini-cluster in the DataCamp cloud. This is ideal for learning the concepts of how to use Spark, but you won't get the same performance boost as you would using a remote cluster on a high-performance server. That means that the examples here won't run faster than if you were only using R, but you can use the skills you learn here to run analyses on your own big datasets.

If you wish to install Spark on your local system, simply install the sparklyr package and call spark_install().

Are you sure you want to continue?

Answer the question
50 XP
Possible Answers
  • press
  • press