Exploring AirBnB listings with scatter plots
1. Exploring AirBnB listings with scatter plots
Hi again, I hope you're enjoying the course so far. In the last video, we explored instantly bookable listings and acceptance rates across the four AirBnB cities. Now, let’s revisit price, a continuous variable, and its relationship to other continuous variables in the dataset. Specifically, I would like to understand how the square footage of a listing relates to it’s price. We can do this with scatter plots. To create one, I’ll first add a new “Scatter chart” visualization to the page, then add “price” to the “Y Axis” and “listing_size_sqft” to the “X Axis”. It is important to remember to update the summarizations applied to these two variables, as Power BI may automatically choose “Sum”. So, I’ll click into the options for each one, and choose “Don’t summarize”. Great. The scatter plot shows a moderate-strong to strong, positive relationship between the listing’s size in square feet and it’s price. Said another way, as the size of the listing increases per square feet, so does it’s price per night. How does this change by city? We can add further context to the scatter plot by dragging the city variable to the “Legend”. We see there are a lot of colored points near the bottom left of the chart. There are also six points in the upper right of the chart. They are listings from Rome (by the orange color) - must be some fancy houses if they are $10K a night and 30K square feet! Adding a second bar chart next to the chart adds further visual context. I’ll create a new Stacked bar chart adding city to the “Axis” and listing_size_sqft to “Values” as a median. The bar chart doubles as a control for a filter by city. As I click on each bar, the scatter plot will show just the data points for that city. This allows interactive exploration of the relationship between these two continuous variables. I don’t want the axis ranges to change as I change cities. I’ll update the formatting by clicking into the “Format” tab under Visualizations. First, I’ll change the X-Axis and set the “Start” value to 0 and the End value to 30,000. For the Y-Axis, I’ll again set the Start value to 0 and the End value to 11,000. Power BI offers advanced analytics features which can be added to charts. For scatter plots, this includes a trend line. As discussed in the previous video, a trend line helps define a linear pattern that is as central as possible through all points. To add one, simply click into the “Analytics” tab. Under the “Trend line” dropdown, click “+ Add”. The new trend line is a visual additive representing the relationship I mentioned previously - strong and positive. We can quantitatively describe the relationship using the correlation coefficient. Power BI again makes this simple through a quick measure. To create one, right-click on the dataset and choose “New quick measure”. In the pop-up window, under “Calculation”, select “Correlation coefficient”, add “listing_id” to the Category, “listing_size_sqft” to the Measure X, and “price” to the Measure Y. Clicking “OK”, Power BI will create a complex DAX function to calculate this metric. I’ll add a card visualization to the page to showcase this new calculation. First, I’ll rearrange the existing charts. Then, select “Card” and drag the new measure to “Fields”. A correlation coefficient of 0.9 is really strong. Again, you can filter for each city, using the bar chart, to see how this metric changes with just the associated data points. Rome seems to have the strongest correlation. Therefore, we can predict with higher confidence that the larger the listing in square feet, the higher the price. Now it’s your turn to explore the relationship of continuous variables with scatter plots and correlation coefficients.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.