Interactive Use of PySpark
Spark comes with an interactive Python shell in which PySpark is already installed. PySpark shell is useful for basic testing and debugging and is quite powerful. The easiest way to demonstrate the power of PySpark’s shell is with an exercise. In this exercise, you'll load a simple list containing numbers ranging from 1 to 100 in the PySpark shell.
The most important thing to understand here is that we are not creating any SparkContext object because PySpark automatically creates the SparkContext object named sc
in the PySpark shell.
This exercise is part of the course
Big Data Fundamentals with PySpark
Exercise instructions
- Create a Python list named
numb
containing the numbers 1 to 100. - Load the list into Spark using Spark Context's
parallelize
method and assign it to a variablespark_data
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a Python list of numbers from 1 to 100
numb = range(____, ____)
# Load the list into PySpark
spark_data = sc.____(numb)