Comparing the halves of your dataset
Just as you inspected the features of your full dataset, it's important to examine the halves after you've split the data. You can always use describe()
on each dataset, but the psych
package also provides some functions to help compare a dataset according to a grouping variable.
In this exercise, you'll use the indices created when splitting the dataset to create a grouping variable and attach it to the gcbs
dataset. Once that grouping variable is set up, you can use describeBy()
and statsBy()
to view basic descriptive statistics as well as between-group statistics.
A word of warning: while the group
argument of describeBy()
has to be a vector, the group
argument of statsBy()
has to be the name of a column in your dataframe. Plan accordingly!
This exercise is part of the course
Factor Analysis in R
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Use the indices from the previous exercise to create a grouping variable
group_var <- vector("numeric", nrow(gcbs))
group_var[___] <- 1
group_var[___] <- 2