Get startedGet started for free

Introduction and base table structure

1. Introduction and basetable structure

Hi! Welcome to the first video of the Foundations of Predictive Analytics course. My name is Nele, I'm a data scientist at Python Predictions. I will introduce you to the fascinating world of predictive analytics, and help you to construct your first predictive models.

2. Predictive analytics in fundraising

First, let's get a better understanding of what predictive analytics entails. Consider the example of a non-profit organization. This organization has a donor base with people that have donated in the past. Assume that the organization wants to send a letter to their donors, to ask to donate for a specific project. One option would be to send the letter to all the candidate donors. However, this would be really expensive. Predictive analytics allows to determine the donors that are most likely to donate. This is exactly what the organization needs: instead of writing a letter to all donors, they can send it to a smaller group of donors, that is most likely to donate.

3. The analytical basetable

In general, predictive analytics is the process that aims to predict an event, using historical data. This data is gathered in the analytical basetable. An analytical basetable is typically stored in a `pandas` dataframe. There are three important concepts in the analytical basetable: the population, the candidate predictors and the target. The population is the group of people or objects you want to make a prediction for. In the fundraising example, it consists of the donors that are in scope for receiving a letter. The basetable has one row for each object in the population. You can check the size of your population using the `len` method in python. The candidate predictors describe the objects in the population. It is information that can be used to predict the event. For instance, variables like age, gender or previous gifts, could be used to predict whether someone will donate for a future project. Finally, the target has information about the event to predict. It is one if the event occurs, and zero otherwise. You can count the number of targets using the `sum` method on the target column in python.

4. The timeline

You might have noticed a contradiction in the definition of the basetable: we assume that the target is known, but actually this is exactly the event we want to predict. In fact, the basetable is not constructed on the current data, but on historical data. We look at a similar event, for instance a similar fundraising campaign, in the past, and construct the basetable on the data available at that time. The target is whether the donor donated for the historical campaign, and the candidate predictors are derived from the information that was available at that time. Next, a predictive model is constructed that links the candidate predictors with the target in the basetable. This predictive model can then in turn be used to predict the current event: the candidate predictors are available and serve as input for the model.

5. Let's practice!

Now let's rehearse the basetable concepts. You will also gain intuition on how candidate predictors can be used to predict the target!