Comparing groups

In this exercise, you will compare distributions of data across groups.

The PlantGrowth dataset from the datasets package contains results from an experiment on yields. The dataset contains two variables:

group - indicates if the given result is from the control group or one of the two treatment groups,
weight - the dried weight of plants used to measure yields.

Recall that tapply() can be used to compute metrics across groups.

For example,

tapply(df$x, df$grp, FUN = median)

returns median of x across grp from the df data frame.

Your task is to calculate means and visualize five common statistics of the yields' weight. The graphical depiction will help you to compare the differences in the response variable across groups.

This exercise is part of the course

Practicing Statistics Interview Questions in R

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Calculate means across groups
tapply(___, PlantGrowth$group, FUN = ___)

# Graphically compare statistics across groups
___(___ ~ group, data = ___)

Edit and Run Code

This exercise is part of the course

Practicing Statistics Interview Questions in R

AdvancedSkill Level

4.8+

Start Course for Free

Want to increase your odds of acing your job interview? If so, brush up on your knowledge of probability theory. In this chapter, we'll roll dice and shoot baskets to explain probabilities using real-life examples.

Exercise 1: Discrete distributions Exercise 2: Probability functions Exercise 3: Bernoulli trials Exercise 4: Binomial distribution Exercise 5: Continuous distributions Exercise 6: Uniform distribution Exercise 7: Shape of normal distribution Exercise 8: Sample from normal distribution Exercise 9: Central limit theorem Exercise 10: Law of large numbers Exercise 11: Simulating central limit theorem

If the job description appeals to you review descriptive statistics before the interview. In this chapter, you will practice exploratory data analysis (EDA) using natural gas prices and data from a survey analysis.

Exercise 1: Descriptive statistics Exercise 2: Centrality measures Exercise 3: Variability measures Exercise 4: Categorical data Exercise 5: Survey analysis Exercise 6: Data encoding Exercise 7: Time series Exercise 8: Time series object Exercise 9: Wrangling time series Exercise 10: Principal Component Analysis Exercise 11: PCA - rotation Exercise 12: PCA - dimension reduction

March confidently into your job interview after reviewing confidence intervals. We'll review the t-test, ANOVA, and normality tests to prepare you for statistics-based coding questions.

Exercise 1: Normality tests Exercise 2: Shapiro-Wilk test Exercise 3: Q-Q plot Exercise 4: Inference for a mean Exercise 5: Confidence interval Exercise 6: One-sample t-test Exercise 7: Comparing two means Exercise 8: Two-sample t-test Exercise 9: Paired test Exercise 10: ANOVA Exercise 11: Comparing groups

Current Exercise

Exercise 12: ANOVA for plant growth

Is your potential employer planning to test your R skills? Make sure you’re prepared and practice model evaluation beforehand. In this chapter, we will fit and evaluate linear and logistic regression models using various biomedical datasets. By the end of this chapter, you’ll be fully prepared to answer any question the interviewer throws your way!

Exercise 1: Covariance and correlation Exercise 2: Covariance by hand Exercise 3: Linear relationship Exercise 4: Nonlinear relationship Exercise 5: Linear regression model Exercise 6: Fitting linear models Exercise 7: Predicting with linear models Exercise 8: Logistic regression model Exercise 9: Fitting logistic models Exercise 10: Predicting with logistic models Exercise 11: Model evaluation Exercise 12: Validation set approach Exercise 13: Regression evaluation Exercise 14: Classification evaluation Exercise 15: Wrapping up