Get startedGet started for free

HTTP

1. HTTP

More and more of the information you'll be working with as a data scientist resides on the web.

2. Data on the web

In fact, you've already worked with such data. Remember how you connected to a remote relational database to get the exact information you needed? The DBI package abstracted the fact that the data was in some remote location and fixed everything for you. In this chapter, you'll look at file formats that are specifically useful when used for web technology, like the JSON file. I'm first going to discuss what actually happens behind the scenes when you're importing data that's on the web. To understand what happens in the examples that follow, I'm going to give you a crash course on the basics of HTTP,

3. HTTP

short for HyperText Transfer Protocol. It's basically a system of rules for how data should be exchanged between computers. In short, HTTP is the language of the web. If you browse to a webpage for example, your computer, the client, is actually sending an

4. HTTP

HTTP request to the

5. HTTP

server. The server then sends back

6. HTTP

data representing the webpage, so it sends a response, and the webpage pops up on your screen. There are several HTTP Methods, as they are called. To simply get a webpage from a server,

7. HTTP

you use the GET Method, for example. Apart from GET, there are also other HTTP methods, but let's not dive into those here. Instead, let's have a look at some examples you might remember from the previous chapters, but this time all of the data will be residing on the web.

8. Example: CSV

Let's start with the states-dot-csv file for example, that's located at this link. The typical workflow would be to manually download the file through your favorite web browser, and then point to the path inside read dot csv. However,

9. Example: CSV

it can be done much easier! Have a look at this line, where we simply pass the URL as a character string. The result is exactly the same: a data frame with 5 observations and 4 variables. How could this be so easy? Well, behind the scenes, R figures out that you referred to a URL, and requests it using an HTTP GET request. The server responds with the csv file, that R can then read in just like it did before. Pretty nice, huh? Nowadays, there are many websites that only accept secure connections. You can only visit these websites, or download their files with the http_S_ prefix. Does R also know how to handle that? Well, let's find out with the same CSV file, but this time

10. Example: CSV

with the HTTPS prefix. This works just the same, awesome. HTTPS support is baked in to R since R version 3 point 2 point 2. Experiment with importing data from the web

11. Let's practice!

yourself in the exercises!