Visualizing distributions
1. Visualizing distributions
Welcome back! This time around we are going to build a histogram showing the age distribution of our riders. The first thing we need to do is to create a calculated field of the user age. Right now we only have their birth year, but we want to go ahead and show the ages in our histogram to make it more clear. To do that, we'll need to subtract Birthyear from 2019, because that's the year of this dataset. We could have used the TODAY() function but that would change the age based on when someone is looking at the visualization, which is not what we want. So go ahead and anchor in the year of the data itself, which is 2019. Click OK and let’s drag it to the dimensions part of the data pane. Next, we're going to use this as our x-axis. So drag it to the Columns shelf and you'll see that it's right away going to create the foundation of that x-axis. Let’s take the measure of the trip count and bring that to the Rows shelf. And let’s also bring Usertype to color on the Marks card to give us a little bit more information. Now, we will be eliminating a couple things here. First of all, I'm going to show the entire view by changing the fit. You can see that we have quite a bit of good data in here and then we have some things that are clearly unlikely to be true. On top of that we have missing values for some of our customers because they don't provide that kind of information without subscribing. So the reasonable thing to do here is to remove our nulls values, just because there's no data there. You can do that by selecting the bar, right-clicking and then excluding them from this viz. Next, I’m going to lasso-select all bars that are higher than 90. Riders older than 90 are very unlikely and are probably mistakes in the data. So I’m going to grab those and select them like that with the lasso. Similar to before we are going to right-click and then exclude them from this visualization. Now our data comes into focus. There is another chart type that we can use to show a distribution. We could also use a more traditional line chart. To make it easy on myself, I’m going to duplicate the visualization like this. Let’s rename it to a more appropriate name. In the Marks card, change the mark type to Line. Let’s undo the mark labels so it looks a little bit smoother. Take the count of trips to size and also increase that size to give it a little bit more dynamic here. That'll give us a really nice secondary look at the age distribution of our users. So, those are two options you can use to visualize the distribution of a field. The choice of chart is up to you and what you think fits best with your data and audience. Let’s try it out!2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.