Get startedGet started for free

Leveling up

1. Leveling up

You've now learned the basics of working with Spark using the dplyr interface. In this chapter, you'll take it a step further by learning advanced dplyr concepts.

2. Help me!

You'll use the select helper functions, such as starts_with(), to easily select multiple columns. These are really useful when you have wide datasets containing hundreds of columns.

3. Computing and collecting

When working with remote data such as Spark DataFrames, an important technique is managing where results are calculated. You'll learn to use the compute() function to store results in Spark, and the collect() function to pull results back to R.

4. SQL and database joins

You'll also dive into database techniques, writing SQL code to query data, and using joins to merge multiple tables together.

5. Let's practice!