Get startedGet started for free

Data generating process

1. Data Generating Process

In the last few lessons, we went over the basics of probability. Now let's dive into perhaps the most important aspect of setting up a simulation - the data generating process.

2. Simulation Steps

In the last chapter, we went over the steps required to run a simulation. Here we will zoom in on the first three steps - Defining random variables, probabilities and relationships - which together help you set up the statistical model. But first, let's see how to choose which random variables to define. In essence, we need to think about how the data will be generated - defining the data generating process.

3. Data Generating Process

When I start thinking about casting a problem in the simulation framework, the data generating process, or DGP, is usually my first task. When designing the statistical model, you want to start by thinking about what factors influence the data followed by the sources of uncertainty. Once you have that quantified, think about the relationship between these factors. You will typically iterate a few times before finalizing the DGP. Designing the DGP is as much an art as it is a science, but with practice, it gets easier over time.

4. Cricket

Next, let's look at a very simple example. Suppose you're trying to model the outcome of a cricket match between two highly competitive teams today - India and England. Let's try to model the DGP.

5. Cricket

The outcome of the match could be influenced by multiple variables: 1) the weather - how cloudy or sunny is it? 2) The location - is it in India? England? Or a neutral location? 3) The pitch conditions - does the central strip of the cricket field favor either team? 4) Recent form - how have the teams been faring of late? And 5) player morale. OK, now let's look for uncertainty. The weather and morale are quite uncertain and random. The location is usually pre-determined. There is some variation in the pitch condition, typically to favor the home team. Recent form is known, but certainly varies from game to game.

6. Cricket

Here's one possible version of the DGP. There are many relationships here, but it is important to note than one factor could influence the outcome in multiple ways. For example, the location not only impacts the pitch conditions, but also the player morale, which could affect the outcome.

7. Let's practice!

In the next few exercises, we will work through some simple DGPs and create statistical models.