Aan de slagGa gratis aan de slag

Running PySpark files

In this exercise, you're going to run a PySpark file using spark-submit. This tool can help you submit your application to a spark cluster.

For the sake of this exercise, you're going to work with a local Spark instance running on 4 threads. The file you need to submit is in /home/repl/spark-script.py. Feel free to read the file:

cat /home/repl/spark-script.py

You can use spark-submit as follows:

spark-submit \
  --master local[4] \
  /home/repl/spark-script.py

What does this output? Note that it may take a few seconds to get your results.

Deze oefening maakt deel uit van de cursus

Introduction to Data Engineering

Cursus bekijken

Praktische interactieve oefening

Zet theorie om in actie met een van onze interactieve oefeningen.

Begin met trainen