Data & the likelihood

1. Data & the likelihood

Now that you've constructed a prior model of your support in the upcoming election, let's turn to the next important piece of a Bayesian analysis - the data!

2. Polling data

In your quest for election to public office, recall that parameter p denotes the underlying proportion of voters that support you. To gain insight into p, your campaign conducted a small poll and found that X = 6 of n = 10 (or 60% of) voters support you. These *data* provide some evidence about p. For example, you're more likely to observe such poll results if your underlying support were also around 0.6 than, say, if it were below the 0.5 winning threshold. Of course, to *rigorously* quantify the likelihood of the poll results under different election scenarios, we must understand how polling data X depend on your underlying support p.

3. Modeling the dependence of X on p

To this end, you can make two reasonable assumptions about the polling data. First, voters respond independently of one another. Second, the probability that any given voter supports you is p, your underlying support in the population. In turn, you can view X, the number of n polled voters that support you, as a count of successes in n independent trials, each having probability of success p. This might sound familiar! Under these settings, the *conditional* dependence of X on p is modeled by the Binomial distribution with parameters n and p (communicated by the mathematical notation here).

4. Dependence of X on p

The Binomial model provides the tools needed to quantify the probability of observing *your* poll result under different election scenarios. This result is represented by the red dot: X = 6 of n = 10 (or 60% of) voters support you.

5. Dependence of X on p

If your underlying support p were only 50%, there's a roughly 20% chance that a poll of 10 voters would produce X = 6.

6. Dependence of X on p

You're less likely to observe such a relatively low poll result if your underlying support p were as high as 80%.

7. Dependence of X on p

Further, it’s possible though unlikely that you would observe such a relatively high poll result if your underlying support were only 30%.

8. What's the likelihood?

Similarly, we can calculate the *likelihood* of these poll results for any level of underlying support p between 0 and 1. Connecting the dots, the resulting curve represents the likelihood function.

9. Likelihood

The likelihood function summarizes the likelihood of observing polling data X under different values of the underlying support parameter p. Thus, the likelihood is a function of p that depends upon the observed data X. In turn, it provides insight into which parameter values are most compatible with the poll. Here we see that the likelihood function is highest for values of support p between 0.4 and 0.8. Thus these values are the *most* compatible with the poll. In contrast, with low likelihoods, small values of support below 0.4 and large values of support above 0.8, are *not* compatible with the poll.

10. Let's practice!

To conclude, the likelihood function plays an important role in quantifying the insights from our data. Though it's *possible* to calculate the exact Binomial likelihood function, you'll use simulation techniques to approximate and build intuition for the likelihood in the following exercises.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.