BaşlayınÜcretsiz Başlayın

Writing to a file

In the video, you saw that files are often loaded into a MPP database like Redshift in order to make it available for analysis.

The typical workflow is to write the data into columnar data files. These data files are then uploaded to a storage system and from there, they can be copied into the data warehouse. In case of Amazon Redshift, the storage system would be S3, for example.

The first step is to write a file to the right format. For this exercises you'll choose the Apache Parquet file format.

There's a PySpark DataFrame called film_sdf and a pandas DataFrame called film_pdf in your workspace.

Bu egzersiz

Introduction to Data Engineering

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Write the pandas DataFrame film_pdf to a parquet file called "films_pdf.parquet".
  • Write the PySpark DataFrame film_sdf to a parquet file called "films_sdf.parquet".

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Write the pandas DataFrame to parquet
film_pdf.____("____")

# Write the PySpark DataFrame to parquet
film_sdf.____.____("____")
Kodu Düzenle ve Çalıştır