1. Dating Data!
The final stretch! In the last chapter you will apply all the spreadsheet formulas & new statistical knowledge you learned to a realistic data set. Reviewing these concepts with realistic data will help you internalize how these formulas actually represent important information. Chapter 4 uses online dating profile data from the standpoint of a business analyst. Let’s start with some descriptive statistics about important matching factors like AGE, Drinking habits & self-reported church visits. In this first lesson, focus on averages, quartiles, & counts to understand the profiles of the user base.
2. Mean Age
As you may guess, AGE is an important factor for matching people. So it’s a good idea to describe this variable’s mean average, max & min. Keep in mind, in spreadsheets the formula isn’t MEAN but instead AVERAGE.
3. Maximum & Minimum Age
Remember to identify the maximum & minimum values using only MIN & MAX along with the data points of interest which can be individual numbers, ranges or an array from the sheet. In this data calculate the MAX & MIN along with average to help you understand the range of dating profiles. Looking for love can happen even late in life!
4. Social behavior
You may also suspect that how often someone drinks alcohol or observes religious ceremonies may indicate social behavior. As a result, describing these variables with additional summary statistics could prove to be relevant when learning about the site’s profiles. Using spreadsheet formulas you will identify the percent of non-alcohol drinkers from a sample of profiles. Then you'll calculate the quartiles for annual church visits. Lastly, you will explore the relationship between AGE & annual church visits using correlation.
In the upcoming exercises some of the formulas need more than just data points. To get a percent of non-drinkers, you first need to use COUNTIF. COUNTIF accepts the data to review and only tallies cells when a condition is true. This example counts spreadsheet cells from A1:A10 only when the value is less than 3. Once tallied, this number is divided by the number of observations to arrive at the percent of the total where the condition is true.
Similarly, when using QUARTILE, you need to declare the specific quartile as the second parameter. In this example code, QUARTILE is calculated on an array A1:B10, and will return the value for the 4th quartile.
Lastly, because correlation is exploring the relationship between two variables, you need to define them in the formula separately. Here, the first variable range is A1:A10 & the second is B1:B10
5. Let's practice!
Let's get to it! By the end of this chapter you'll know a lot about the online dating scene.