Get startedGet started for free

HTTP requests to import files from the web

1. HTTP requests to import files from the web

Congrats on importing your first web data! In order to import files from the web,

2. URL

we used the urlretrieve function from urllib dot requests. Lets now unpack this a bit and, in the process, understand a few things about how the internet works. URL stands for Uniform or Universal Resource Locator and all they really are are references to web resources. The vast majority of URLs are web addresses, but they can refer to a few other things, such as file transfer protocols (FTP) and database access. We'll currently focus on those URLs that are web addresses OR the locations of websites. Such a URL consists of 2 parts, a protocol identifier http or https and a resource name such as datacamp dot com. The combination of protocol identifier and resource name uniquely specifies the web address! To explain URLs, I have introduced yet another acronym

3. HTTP

http, which itself stands for HyperText Transfer Protocol. Wikipedia provides a great description of HTTP. "The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web." Note that HTTPS is a more secure form of HTTP. Each time you go to a website, you are actually sending an HTTP request to a server. This request is known as a GET request, by far the most common type of HTTP request. We are actually performing a GET request when using the function urlretrieve. The ingenuity of urlretrieve also lies in fact that it not only makes a GET request but also saves the relevant data locally. In the following, you'll learn how to make more GET requests to store web data in your environment. In particular, you'll figure out how to get the HTML data from a webpage. HTML stands for Hypertext Markup Language and is the standard markup language for the web.

4. GET requests using urllib

To extract the html from the wikipedia home page, you import the necessary functions, specify the URL, package the GET request using the function Request, send the request and catch the response using the function urlopen. This returns an HTTPResponse object, which has an associated read method. You then apply this read method to the response, which returns the HTML as a string, which you store in the variable html. You remember to be polite and close the response!

5. GET requests using requests

Now we are going to do the same, however here we'll use the requests package, which provides a wonderful API for making requests. According to the requests package website. "Requests allows you to send organic, grass-fed HTTP/1 dot 1 requests, without the need for manual labor." and the following organizations claim to use requests internally: "Her Majesty's Government, Amazon, Google, Twilio, NPR, Obama for America, Twitter, Sony, and Federal U.S. Institutions that prefer to be unnamed."

6. GET requests using requests

Moreover, "Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month. All the cool kids are doing it!" Lets now see requests at work. Here, you import the package requests, specify the URL, package the request, send the request and catch the response with a single function requests dot get; apply the text method to the response which returns the HTML as a string.

7. Let's practice!

That's enough out of me for the time being. Let's get you hacking away at pulling down some HTML from the web using GET requests! GET coding!