Get startedGet started for free

Downloading data using curl

1. Downloading data using curl

Welcome to Intermediate Shell! My name is Susan Sun, and I do data work. I'm looking forward to learning with you in this course. In data, many of us bypass the command line in favor of GUI interfaces like RStudio because that is what we are familiar with. However, taking the time to learn data science on the command line is a great long term investment that will, ultimately, make us better and more productive data people. In this course, we take a practical approach and learn command line tools useful for everyday data processing and analyses. First, let's learn how to download data files using curl.

2. What is curl?

curl, short for Client for URLs, is a Unix command line tool for transferring data to and from a server. It is often used to download data from HTTP sites and FTP servers.

3. Checking curl installation

To check if curl has been properly installed, type the following in the command line: man curl If curl has not been installed, you will see: curl command not found To install curl, follow this link.

4. Browsing the curl Manual

If curl is installed, your console will look like this:

5. Browsing the curl Manual

Keep pressing Enter to scroll through the curl manual. To exit and return to your console, press q.

6. Learning curl Syntax

The basic syntax for curl has the following structure: curl, option flags, URL The URL is required for the command to run successfully. curl supports a large number of protocol calls. For a full list, use curl dash-dash-help.

7. Downloading a Single File

Let's download a single file stored at this hypothetical URL using curl. To save the file with its original name datafilename-dot-txt, use the option flag dash-uppercase-O. This reads: curl dash uppercase-O followed by the file URL location To save the file under a different name, replace dash uppercase O with dash lowercase o and the new filename. Now it reads: curl dash lowercase o followed by the new filename and the file URL location

8. Downloading Multiple Files using Wildcards

Oftentimes, a server will host multiple data files, with similar filenames. Like this: Instead of curl-ing each file individually, we can use wildcards to download all the files at once. To download every file hosted on this server that starts with datafilename and ends in dot-txt, we use: curl dash uppercase-O https colon forwardslash forwardslash websitename-dot-com forwardslash datafilename asterisk dot txt

9. Downloading Multiple Files using Globbing Parser

Another option is to increment using a globbing parser. The following will download every file sequentially starting with datafilename001-dot-txt and ending with datafilename100-dot-txt. Note the end of the command that reads: square bracket zero zero one dash one hundred close square bracket-dot-txt. That's the globbing at work.

10. Downloading Multiple Files using Globbing Parser

We can increment through the files and download every Nth file. For example, to download every 10th file, we can modify the globbing parser to read: open square bracket zero zero one dash one hundred colon ten close square bracket dot txt

11. Preemptive Troubleshooting

Sometimes Internet can time out. To make sure that our download progress is not lost, curl has these two flags: dash-uppercase-L redirects the HTTP URL if a 300 error code occurs. dash-uppercase-C resumes a previous file transfer if it times out before completion. Putting everything together: Note that all option flags come before the URL, but the order of the flags does not matter.

12. Happy curl-ing!

In this lesson, we learned how to download files using curl. Let's put our new knowledge to practice! Happy curl-ing!