Bringing it all together I
You've built a solid foundation in PySpark, explored its core components, and worked through practical scenarios involving Spark SQL, DataFrames, and advanced operations. Now it’s time to bring it all together. Over the next two exercises, you're going to make a SparkSession, a Dataframe, cache that Dataframe, conduct analytics and explain the outcome!
Bu egzersiz
Introduction to PySpark
kursunun bir parçasıdırEgzersiz talimatları
- Import
SparkSessionfrompyspark.sql. - Make a new
SparkSessioncalledfinal_sparkusingSparkSession.builder.getOrCreate(). - Print
my_sparkto the console to verify it's aSparkSession. - Create a new DataFrame from a preloaded schema and column definition.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Import SparkSession from pyspark.sql
from ____ import ____
# Create my_spark
my_spark = SparkSession.builder.appName(____).____
# Print my_spark
____
# Load dataset into a DataFrame
df = ____(data, schema=columns)
df.show()