BaşlayınÜcretsiz Başlayın

Running SQL Queries Programmatically

DataFrames can be easily manipulated using SQL queries in PySpark. The sql() function in a SparkSession enables applications to run SQL queries programmatically and returns the result as another DataFrame. In this exercise, you'll create a temporary table of DataFrame that you have created previously, then construct a query to select the names of the people from the temporary table and assign the result to a new DataFrame.

Remember, you already have a SparkSession spark and a DataFrame available in your workspace.

Bu egzersiz

Big Data Fundamentals with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Create a temporary table people.
  • Construct a query to select the names of the people from the temporary table people.
  • Assign the result of Spark's query to a new DataFrame - people_df_names.
  • Print the top 10 names of the people from people_df_names DataFrame.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Create a temporary table "people"
people_df.____("people")

# Construct a query to select the names of the people from the temporary table "people"
query = '''SELECT name FROM ____'''

# Assign the result of Spark's query to people_df_names
people_df_names = spark.sql(____)

# Print the top 10 names of the people
people_df_names.____(____)
Kodu Düzenle ve Çalıştır