BaşlayınÜcretsiz Başlayın

File size optimization

Consider if you're given 2 large data files on a cluster with 10 nodes. Each file contains 10M rows of roughly the same size. While working with your data, the responsiveness is acceptable but the initial read from the files takes a considerable period of time. Note that you are the only one who will use the data and it changes for each run.

Which of the following is the best option to improve performance?

Bu egzersiz

Cleaning Data with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Uygulamalı interaktif egzersiz

İnteraktif egzersizlerimizden biriyle teoriyi pratiğe dökün

Egzersizi başlat