Data downloading with Wget and curl
To kick off a data analysis project, it's good practice to first consolidate all of our data into one place. Often times, this means downloading and pulling data from various locations such as HTTP servers and databases.
While curl is handy for downloading a single file, it's somewhat unwieldy for handling multiple file downloads. In this capstone exercise, we will use both curl and Wget to download a series of monthly Spotify files, do some minor processing, and consolidate all downloaded files in our local directory.
Diese Übung ist Teil des Kurses
Data Processing in Shell
Anleitung zur Übung
- Download the zipped
201812SpotifyDatadata saved in the shortened (redirected) URL usingcurl. In the same step, rename file asSpotify201812.zip. - Unzip
Spotify201812.zip, delete the original zipped file, and rename the unzipped file toSpotify201812.csvto stay consistent. - Use
url_list.txtandWgetto download all 3 files:Spotify201809.csv,Spotify201810.csv, andSpotify201811.csvin one step, with an upper cap download speed of 2500KB/s.
Interaktive Übung
Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.
# Use curl, download and rename a single file from URL
___ ___ Spotify201812.zip ___ https://assets.datacamp.com/production/repositories/4180/datasets/eb1d6a36fa3039e4e00064797e1a1600d267b135/201812SpotifyData.zip
# Unzip, delete, then re-name to Spotify201812.csv
unzip Spotify201812.zip && rm Spotify201812.zip
mv 201812SpotifyData.csv ___.csv
# View url_list.txt to verify content
cat url_list.txt
# Use Wget, limit the download rate to 2500 KB/s, download all files in url_list.txt
wget ___=2500k -i url_list.txt
# Take a look at all files downloaded
ls