1. Introduction to APIs
You now know how to load JSON data into pandas, given a path or URL to a JSON file. In this lesson, we'll turn our attention to working with web application programming interfaces, or APIs, the most common source of JSON data.
2. Application Programming Interfaces
An application programming interface is a defined way for an application to communicate with other programs, and vice versa.
They let programmers get data from an application without having to know about that application's database architecture. One caveat -- APIs are shared resources, and often limit how much data you can get in a specified timeframe.
Using an API to get data is like using a catalog to order products. The catalog shows what's available and provides order instructions.
3. Application Programming Interfaces
You send a properly formed order to the right address
4. Application Programming Interfaces
and get back what you asked for.
Similarly, an API provides an endpoint to send requests to, and documentation describes what a request should look like, such as parameters to include.
5. Requests
While there are Python libraries geared towards popular APIs, we'll use the Requests library in this course.
Requests lets users send and get data from any URL,
so it's not tied to any particular API.
The function to retrieve data from a URL, logically, is requests get.
6. requests.get()
Requests get takes a string of the URL to get data from,
and has optional keyword arguments that are useful for working with APIs.
The params keyword lets you pass a dictionary of parameter names and values to customize API requests.
The headers keyword also takes a dictionary of names and values. If the API you're using requires a user authentication key, it would be passed in the header.
The result is a response object, containing data and metadata.
We need to use the response's JSON method to get just the data.
7. response.json() and pandas
Importantly, the JSON method returns a dictionary,
which read JSON can't parse -- it expects a string.
To load the data to a dataframe, we need to use pd DataFrame instead.
8. Yelp Business Search API
We'll use Requests to work with data from Yelp. Yelp lets users submit ratings and reviews for businesses and makes that data available via its APIs.
9. Yelp Business Search API
In the API documentation,
10. Yelp Business Search API
we see the endpoint URL,
11. Yelp Business Search API
optional and
12. Yelp Business Search API
required parameters to use in a request,
13. Yelp Business Search API
and even a sample response. Note that if we want business details, the data we need is nested under the businesses key.
14. Making Requests
Let's get data from the API about bookstores in San Francisco, California. We import pandas and the Requests library,
then create a variable for the API endpoint.
The API documentation indicated required parameters and authorization, so we also set up a dictionary of request parameter fields and values,
plus a headers dictionary of authorization info. API keys are strings used to identify the program calling the API and confirm it can make the call. They are sensitive information, so it's hidden behind the API key variable here.
We pass the API url, params, and headers to requests get, and store the result as response.
15. Parsing Responses
Since responses contain data and metadata, we use the JSON method to isolate the data. Let's check it before loading it to a dataframe.
We have a dictionary, with the information we want in a list under the businesses key.
We pass data businesses to pd DataFrame, not read JSON, to create the dataframe.
When we check the head, we see we did get business details.
16. Let's practice!
Oof, that was a lot of new information. Don't worry though, practice will cement all that knowledge!