1. Maps and Twitter data
In this chapter, we're going to working with the geographical Twitter data. We're going to learn how to extract geographical coordinates from the Twitter JSON and how to put that data onto maps in Python. We're also going to learn about the Basemap package and how to create maps in Basemap for use with Twitter data and beyond.
2. Why maps?
Why should we want to map Twitter data? There's a number of reasons we'd want to put Twitter data onto a map. First, we want to understand the geographical concentration of our Twitter dataset. If we have a dataset of tweets about an event, are people tweeting participants in the event or are they observers commenting on the event or the news coverage? We can also use mapping as a means to differentiate between different types of tweets and Twitter users. For instance, do data scientists tweet about R or Python more in Bangalore? Or do NBA fans tweet more for the Warriors or the Lakers more in California?
3. How Twitter gets location data
Twitter obtains location data in a number of ways. Not all tweets contain location data. Users have to opt in to share their data. From there, their location is obtained from their device, whether it's a smart phone, tablet, or laptop. The accuracy of these locations can vary widely between devices, so in general, it makes more sense in practice to aggregate locations up to the county or state level.
4. Beware selection biases!
A word of warning -- only a very small fraction of Twitter data contains any kind of location data. Given that Twitter users need to opt-in to having their location data collected, and that not all devices have this capability, only about 1-3% percent of all Twitter data have any location data associated with it. This limits the generalizability of the inferences you can make from geographical data by itself. So keep this in mind as we work through this chapter.
5. Types of geographical data available in Twitter
Location-based data can take on several different formats within the Twitter JSON object. Each of these has different levels of precision. On the most basic and imprecise level, someone can mention their location in the text of their tweet. Second, they can also mention their location in their user profile. Third, their location can be given by what's called a bounding box, which is a box of geographical coordinates drawn around their general area. Lastly, their exact location can be pinpointed by geographical coordinates and points.
6. Let's practice!
In the next lesson we're going to work with each of these types of data, but first we're going to make sure we know why we'd want to work with geographical data.