R lets you write data analysis code quickly. With a bit of care, you can also make your code easy to read, which means that you can easily maintain your code too. In many cases, R is also fast enough at running your code.
Unfortunately, R requires that all your data be analyzed in memory (RAM), on a single machine. This limits how much data you can analyze using R. There are a few solutions to this problem, including using Spark.
Spark is an open source cluster computing platform. That means that you can spread your data and your computations across multiple machines, effectively letting you analyze an unlimited amount of data. The two technologies complement each other strongly. By using R and Spark together you can write code fast and run code fast!
sparklyr is an R package that lets you write R code to work with data in a Spark cluster. It has a
dplyr interface, which means that you can write (more or less) the same
dplyr-style R code, whether you are working with data on your machine or on a Spark cluster.
Scream if you want to go faster!