More complex relationships

1. More complex statistical (& profile) relationships

Its the final push!

2. Ice cream is a good first date

Let's re-introduce linear regression & review hypothesis testing. Previously you used LINEST with one x-variable. Now you will expand the equation to include more than one. Here's an example with ice cream.

3. Predicting Ice Cream Cones

Ice cream sales could be a function of specific inputs. Temperature likely has an impact on cone sales, the hotter the more cones sold.

4. Temperature as an input

What else could be added to our equation? You won't sell many cones charging $1000 but you would at $1. So price is another input. Lastly if you could represent Saturday as 1 & all other days as a 0, then you could understand weekend sales.

5. Cones as a function of temp, price & saturday

Thus you could represent cone sales with the bottom equation. The funny looking B's are called betas. The first, beta-naught, is the y-intercept. The next Beta is multiplied by the temperature. This is added to the next beta times the Saturday, 1 or 0 then added again to the third beta times the price. In this equation you also have an error term representing any of the cone sales the equation didn't correctly predict. The math is easy and other DataCamp courses go into more detail on linear regression. For our purposes focus on the calculation & using more than one input.

6. Multiple Regression in sheets

You're already familiar with the LINEST function. Previously you passed in the Y, cone sales, and then a single column, the X. Now you will change the formula so that the second parameter accepts a cell array.

7. Deconstructing the relationship

Pay close attention to the results so you can calculate the linear estimate from multiple inputs correctly. Here is a truncated table with your 2 x-variables & y. Assuming Temp is column A, the formula is LINEST(C2:C500) as Y and A2:B500 represent x-variables.

8. The tricky sheets results

The linear estimate from LINEST will produce betas for each input. The formula cell will get the first informative variable's beta then moving to the right for the next beta & so on. At the very end, sheets will add the beta-naught value. Notice too that there is no error term. The tricky part is that the columns aren't labeled so you must remember beta-naught is at the end.

9. Math to estimate cone sales.

Using the equation here is an example day's inputs. Thus, the cone sales are estimated to be 30.8.

10. On to hypothesis testing

Next, you will revisit hypothesis testing on the profiles data. You will need to calculate expected frequencies from a pivot table. This example has the interaction of height & eye color. To get expected frequency multiple a column sum by a row sum. Then divide that by the total number of records. For example, to calculate the expected frequency for tall brown eyed people sum the brown column 25+20=45 times the sum of the tall row 20+25=45. So, 45*45=2025 divided by 100 total records, means you should expect 20.25 for tall brown eyed people.

11. Chi-Squared Test

With expected frequencies, you can perform a chi-squared hypothesis test to determine if height is independent to eye color. The NULL hypothesisis that height is independent of eye color, left to chance. Therefore the alternate can be non-independence or simply NOT H0.

12. Acceptance or rejection for dating profiles

After putting in the actual count data followed by the expected values table you will get a result of 0.055. Typically, if the result is less than 0.05, you REJECT the null hypothesis. Here it is ever-so-slightly greater than 0.05. As a result you accept the NULL hypothesis. Statistically, with this data sample, height & eye color are independent.

13. Let's practice!

Almost there!