Running PySpark files
In this exercise, you're going to run a PySpark file using spark-submit
. This tool can help you submit your application to a spark cluster.
For the sake of this exercise, you're going to work with a local Spark instance running on 4 threads. The file you need to submit is in /home/repl/spark-script.py
. Feel free to read the file:
cat /home/repl/spark-script.py
You can use spark-submit
as follows:
spark-submit \
--master local[4] \
/home/repl/spark-script.py
What does this output? Note that it may take a few seconds to get your results.
This exercise is part of the course
Introduction to Data Engineering
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
