Data downloading with Wget and curl
To kick off a data analysis project, it's good practice to first consolidate all of our data into one place. Often times, this means downloading and pulling data from various locations such as HTTP servers and databases.
While curl
is handy for downloading a single file, it's somewhat unwieldy for handling multiple file downloads. In this capstone exercise, we will use both curl
and Wget
to download a series of monthly Spotify files, do some minor processing, and consolidate all downloaded files in our local directory.
This exercise is part of the course
Data Processing in Shell
Exercise instructions
- Download the zipped
201812SpotifyData
data saved in the shortened (redirected) URL usingcurl
. In the same step, rename file asSpotify201812.zip
. - Unzip
Spotify201812.zip
, delete the original zipped file, and rename the unzipped file toSpotify201812.csv
to stay consistent. - Use
url_list.txt
andWget
to download all 3 files:Spotify201809.csv
,Spotify201810.csv
, andSpotify201811.csv
in one step, with an upper cap download speed of 2500KB/s.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use curl, download and rename a single file from URL
___ ___ Spotify201812.zip ___ https://assets.datacamp.com/production/repositories/4180/datasets/eb1d6a36fa3039e4e00064797e1a1600d267b135/201812SpotifyData.zip
# Unzip, delete, then re-name to Spotify201812.csv
unzip Spotify201812.zip && rm Spotify201812.zip
mv 201812SpotifyData.csv ___.csv
# View url_list.txt to verify content
cat url_list.txt
# Use Wget, limit the download rate to 2500 KB/s, download all files in url_list.txt
wget ___=2500k -i url_list.txt
# Take a look at all files downloaded
ls