Get startedGet started for free

Data preparation

1. Data preparation

Hello and welcome! My name is Lis and I'll be one of your instructors in this Tableau course. In this first chapter, we'll leverage visual analytics to reveal insights and to show relationships not easily seen in traditional reports. We'll apply Tableau's built-in tools and calculations to extend insights provided by the source data. Let's get started!

2. Data preparation

Data preparation is a crucial step in the data analytics workflow. With any new dataset, we need to first examine it to see if any fields need refinement. We should also consider creating calculated fields from existing fields to more effectively tell our data story. And, it's important to take a close look at the fields and see which can be summarized and grouped at a higher level. Finally, we also want to identify categorical fields that can be used to slice and dice the data. Slice and dice means breaking down information into smaller parts with different perspectives. We'll see examples of each in the following exercises, but first let's talk about the dataset we'll use throughout the course.

3. Chicago's Divvy bike sharing system

Divvy is Chicago's bike sharing system. With the city of Chicago, Divvy publishes historical trip data and makes it available for public use. Trip data spans back as early as 2013. That's a lot of data, so we'll be focusing on trips from the first half of 2019. Our data is split into two tables.

4. Divvy dataset: stations table

First, the station table which describes all the different stations throughout Chicago, including a station's unique id, the station's name (usually described by the street intersection), the station's location via coordinates, and the number of docks available at each station.

5. Divvy dataset: trips table

Second, we have the trips table. Each row in the table represents a trip from the first half of 2019. Each trip has a unique trip id and the id of the bike used. The travel time for each trip is captured in seconds. We also have the exact time that the bikes are checked out and back in Central Standard Time, along with the name and id of the starting and ending stations. Divvy riders are either subscribers and non-subscribers, who are referred to simply as customers in the data. More information is known about subscribers because of the ongoing relationship, that is birthyear and gender.

6. Dimension and measure recap

From a data structure perspective, we have both categorical and numeric values. Having a mixture is important to create a variety of visuals. Remember that Tableau organizes data into two main groups: Dimensions and Measures, where Dimensions represent categorical or qualitative data. Measures represent numerical data that can be aggregated. We can move fields between these two types, but we need to do it strategically. As we load any new data and begin analysis, an excellent first step is to make sure fields are placed in the right section. All numeric values, by default, arrive in the Measures section. If a numeric field is one that shouldn't be aggregated, then we will move them to the Dimensions section. IDs are great example of this, because it's meaningless to add or average up IDs.

7. Let's practice!

Alright, let's get started with some exercises!