1. Introduction to decomposition trees
Hi! It looks like you're enjoying the course so far. My name is Maarten, and I'll be your instructor for this part of the course.
This lesson will continue to explore relationships between variables. Specifically, decomposing a target variable by other explanatory variables.
2. What is a decomposition tree?
Decomposition trees are a visualization which allow the methodical break down of a variable by multiple dimensions or variables. This provides insight into how each variable influences the value of the target variable.
They are useful for ad hoc exploration, root cause analysis (i.e. finding core underlying reasons for an observed trend in a metric), and identifying influential variables which explain the variation in the target variable.
3. Structure of a decomposition tree
To understand the structure of a decomposition tree and how to read them, we will use an example of a company looking to understand why a portion of orders are on backorder (i.e. products are out of stock).
4. Structure of a decomposition tree
The architecture looks like branches of a tree, hence decomposition TREE.
5. Structure of a decomposition tree
At the far left is a root node where the rest of the tree will sprout from. In the Power BI visualization, this is determined by the "Analyze" input variable.
The value for this node is calculated using the complete dataset.
6. Structure of a decomposition tree
Here, we see the root node is percentage on backorder.
The analyze variable can also take on different aggregations such as a sum, mean, or median. Here it is a percentage of the group total.
7. Structure of a decomposition tree
Explanatory, or "Explain by", variables are used to decompose the target variable.
Here, Forecast Bias, Demand Type, and Plant number are being used. The result is three levels - one for each explanatory variable. You can add up to 50 as inputs to the visualization.
8. Structure of a decomposition tree
Underneath each level are child nodes. Each of these are a value from the respective explanatory variable of that level.
If the explanatory variable is continuous, each unique value will be used as a child node. So, depending on your analysis, it may be best to bin such variables (similar to what we did with histograms).
9. Structure of a decomposition tree
Leaf nodes are the right most child nodes of the decomposition tree which are not able to be expanded further.
10. Structure of a decomposition tree
By default, the fill of each bar is determined by the child node's value compared to the max child node value of the level.
Here, we see the top child node, "Consistent under..." is filled to the max. This is because it is also the max value, 6.72%, of the level.
The bars of the child nodes below are filled in less, as they have values less than 6.72%.
The way bars are filled can be changed within the formatting options, but will not be covered in this lesson.
11. Structure of a decomposition tree
Putting it all together results in a path, or branch. Starting from the root node, a path traces from one child node to another in the subsequent levels.
An example is the path highlighted here, investigating the reason for a low percentage of orders on back order with a "Moderate" Forecast Bias.
The value of the target variable, percentage on backorder, changes at each child node. This is because the underlying set of data is filtered before calculating the metric.
Let's zoom into this path highlighted in red for an example.
12. Reading a decomposition tree
Starting with the root node, all data is considered (unless explicit filters are created in the Filters pane). The result, 5.07%, indicates that out of all orders in this dataset, 5.07% are on backorder.
13. Reading a decomposition tree
The first level is "Forecast Bias". The path highlighted, or drilled into, the value of "Moderate over (5%)".
Therefore, the complete data was filtered for just orders with a "Moderate" Forecast Bias. Then, the percentage on backorder metric was calculated, resulting in 4.10%.
14. Reading a decomposition tree
The next level is "Demand Type"; the value "Intermittent" was highlighted.
Therefore, before the percentage on backorder metric was calculated, the data was filtered for orders with a "Moderate" Forecast Bias and an intermittent demand type. The result is 4.01%.
As you may have noticed, at each level, a smaller subset of the complete dataset is being used to calculate the metric.
15. Reading a decomposition tree
At the final level we see the specific plant where orders are coming from. Plant #0477, a leaf node, was highlighted in the path.
5.01% of orders with a moderate Forecast Bias, intermittent Demand Type, and from Plant #0477 are on backorder.
Exploring in this manner uncovers how the percentage of records on backorder can increase or decrease depending on other variables. If the percentage for Plant #0477 was greater, say 25%, it would be a key place to further analyze to understand why such a large percentage from this plant are on backorder.
16. Let's practice!
Now it's your turn to explore decomposition trees and use them to analyze the influence on superhost status of AirBnB hosts.