Inference (causal) models

1. Inference (causal) models

Great work! Now we will dive deeper into inference models and go through a specific example to understand how they work.

2. What is causality?

Let's begin with understanding what causality is. We talk about causality when we want to identify a causal relationship regarding how much a certain action or actions affect on outcome of interest. These answer the "why?" type of questions, for example "Why did the sales increase this month", "Why do the customers cancel their subscription", or "What are the most predictive indicators of a fraudulent transaction?" These models optimize for interpretability, not performance and accuracy. Finally, causal models are used with the so-called observational data, where there's no experiment. Let's talk more about that!

3. Experiments vs. observations

In a perfect case scenario, experiments are preferred over causal models, because they are more accurate, and conclusions are stronger. Unfortunately, experiments are not always possible due to ethical, cost or other reasons, therefore the researchers have to deal with observational data and extract causal relations there. For example, when assessing the effect of certain chemicals on human beings, experimentation would be illegal, hence a causal model has to be built using data from people who have been exposed to those chemicals accidentally, and comparing their health indicators to people without prior exposure. Whenever possible though, experiments are preferred over observational studies since it's faster and easier to come to conclusions.

4. Best practices

So, concluding the previous slides with the best practices. First is, if you can do an experiment, do it, it will be faster and much more accurate than a causal model. If it's not possible to run experiments everywhere in a certain department, then run them less frequently, establishing a baseline expected effect and then using it to make decisions. Only if these two are not possible, should an inference model be built.

5. Inference model example

Let's start with an example. Here we have a dataset of customers with their last month spend, time since last purchase in days - or recency, average number of a cart, and number of store visits in the last 12 months. Finally the target variable of next month spend.

6. Inference - training

We want to understand how much these data points affect the outcome.

7. Inference - learning

We then run a model to learn the rules how to predict next month's spend.

8. Inference - regression coefficients

We will typically be presented with the so-called coefficients which mean how much each of these values is affecting the output. The larger the number, the bigger the effect. Positive numbers and negative numbers affect the outcome accordingly.

9. Inference - interpretation

While the interpretation of coefficients is beyond the scope of this course, the first variable here is the most predictive with the largest coefficient of 0.58. This means that on average, a customer who spends 1 dollar more than other customer last month, is likely to spend 0.58 dollar more next month.

10. Let's practice!

Great, let's move to practice these concepts with some exercises!