Using decomposition trees in Power BI
1. Using decomposition trees in Power BI
Superhost status is a badge of honor for AirBnB hosts. It signifies to people looking for a place to stay that the host is responsive, top-notch, and quality. In this demo, we’ll learn how decomposition trees can be used to determine what makes a host more likely to be certified as a superhost, identified by the variable “host_is_superhost”. Before creating a decomposition tree, I need to transform the existing variable to provide a percentage of hosts who are superhosts, i.e. has a value for host_is_superhost of “t” for “true”. Right-click on the Airbnb dataset and select “New measure”. I’ll call it “host_is_superhost_rate”. The DAX formula will use CALCULATE(). The expression will be a distinct count of host_id. The filter will be on the Airbnb dataset and for observations where host_is_superhost is equal to “t” or true. Finally, to get a percentage, I’ll divide the result of the CALCULATE() formula by the distinct count of host_id for the entire airbnb dataset. I’ll change to a percentage, with one decimal place. Alright, on a new page, I’ll create a decomposition tree. “Analyze” is the variable to be decomposed and explained – I’ll drag host_is_superhost_rate here. “Explain by” are the variables to be used for analysis. I’m interested in trying to explain the superhost status by city, instant bookable status, and neighborhood. Awesome. This first bar, the root of the decomposition tree, indicates that 14.1% of all hosts in the dataset are superhosts. Now, there are several ways to analyze, or drill into, this status. First, click on the light grey “+” sign next to the “Analyze” variable on the page to open the options menu. “High value” and “Low value” are options for “AI Splits”, which we will come back to. Underneath those are the “Explain by” variables. Selecting one of these will drill into that variable – I’ll select “instant_bookable”. Several new bars pop up with the values of the instant_bookable variable - t, or true the host offers an instantly bookable place, and f, or false, they do not. The percentages here are the proportion of hosts in these two subsets of the data which are superhosts. For hosts in the data which offer instant bookable places, 14.7% are superhosts. For those that do not, only 13.5% are superhosts. The bars are a visual indicator of the proportion. It is important to note that Power BI will set the amount each bar is filled proportional to the maximum value in this level. Since 14.7% is the maximum value, the bar for instant_bookable equals “t” is completely filled while the other bar “f” is filled in slightly less. We can continue drilling in to the superhost rate by selecting either the “t” or “f” group. I’ll select “t”. Then choose another “Explain by” variable. I’ll select “city” first. Here, the percentages represent the number of host in the city with instant bookable listings that are also superhosts. In Rome, 19.9% are, while in Sydney 8.9% are. We can drill into the city Sydney by selecting options and selecting the neighborhood. At the end, we can see 66.7% of Hurstville, Sydney hosts with instant bookable listings are superhosts. Whereas, there are neighborhoods with less than 5% superhosts. Clicking on any of the bars from a level earlier in the decomposition tree will change the branches. For example, clicking on “f” for instant_bookable will switch later levels to include only hosts without instant bookable listings. It shows that half of the hosts with non-instant bookable listings in the Mosman neighborhood of Sydney are superhosts. We could get to this same view by expanding all the variables first then clicking on each branch to explore. To show this, I’ll lock the “instant_bookable” variable which will keep it in place, then remove the other two. Like working with visualizations with a date hierarchy, the decomposition tree creates a hierarchy of the variables added to “Explain by”. Click on this split double-arrow icon will reveal each variable until all are on the page. Then you can simply click on a bar to expand to the next level until the end. Let’s come back to AI Splits. They are algorithmic ways to drill into the variable based on whether you are interested in exploring how the other variables explain high values or low values of the target variable, in this case superhost status. I’ll remove all the variables again; unlock instant_bookable to remove it. Then click on the “+” and choose “High value” which will find the field with the highest percentage of superhosts. The AI Split revealed “neighborhood”. Again, when “High Value” for AI Splits is chosen, Power BI will look at the proportion of superhosts for each level and chose the level with the highest value, in this case 100%. New York is the city where the Clifton neighborhood is located. It’s value is 100%, indicating that 100% of hosts in New York, in the Clifton neighborhood are superhosts. Expanding one more time to the instant_bookable level shows a value of “f”, or false. It means all hosts under New York, in Clifton do not offer instantly bookable listings. But all are superhosts. As you can see, AI Splits are extremely useful for digging into a target variable to understand main influences on high or low values. Now it’s your turn to build decomposition trees and analyze Glassdoor reviews.2. Let's practice!
Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.