Get startedGet started for free

Running PySpark files

In this exercise, you're going to run a PySpark file using spark-submit. This tool can help you submit your application to a spark cluster.

For the sake of this exercise, you're going to work with a local Spark instance running on 4 threads. The file you need to submit is in /home/repl/spark-script.py. Feel free to read the file:

cat /home/repl/spark-script.py

You can use spark-submit as follows:

spark-submit \
  --master local[4] \
  /home/repl/spark-script.py

What does this output? Note that it may take a few seconds to get your results.

This exercise is part of the course

Introduction to Data Engineering

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise