1. Determining dimensionality
You now know how to do a single-factor EFA, which is useful for seeing how each item relates to a single hypothesized factor. However, you may be wondering, "Haven't I also heard about EFA as a method for dimensionality reduction?" If so, you're right! In the context of measure development, we think of this as figuring out how many unobservable factors are represented by the items on the measure.
2. How many dimensions does your data have?
In a truly exploratory situation, you may not know which factors the items on your measure are related to - or even how many factors are represented by the items on your measure.
3. The bfi dataset
To switch things up a bit, you'll be using the Big Five Inventory, or bfi, dataset, which contains 2,800 subjects' responses to 25 questions.
4. Big five personality taits
Five questions measure each of the big five personality traits: extraversion, agreeableness, openness, conscientiousness, and neuroticism. These "big 5" traits have been extensively studied and are agreed upon by many personality researchers.
The bfi dataset in the psych package includes three demographic variables. Since your analyses only work with item response data, I've removed those for you.
Though we clearly already know the scientific theory behind this measure and dataset, we'll pretend we don't have this information for the purpose of learning how to run EFAs.
5. The bfi dataset
If you use head() to look at the first six rows of the data, you can see that item responses range from 1 to 6, representing respondents' ratings on a six-point scale ranging from Very Inaccurate to Very Accurate.
You'll also notice that the column names each consist of a letter and a number. These indicate the personality trait that the item is hypothesized to measure and the question number. For example, A1 is the first question for the agreeableness trait.
6. Setup: split your dataset
You'll remember from Chapter 1 that when you are going to use the same dataset for both exploratory and confirmatory analyses, it's important to split the data. Using the same dataset for both can result in overfitting.
We'll use the same splitting strategy as in Chapter 1. Here, you'll use one-half of the data for the EFA and the other for the CFA.
7. Setup: split your dataset
Checking the output from head() verifies you've created two
distinct halves.
8. An empirical approach to dimensionality
Though we know the theorized factors for the bfi dataset, you may not always have a theory to guide analysis. In the absence of theory, you can use an empirical approach to quantify dimensionality.
To figure out the number of factors the items represent, you can look at eigenvalues, which are a way of quantifying the unique factors within a correlation matrix. We'll work through an applied example of using eigenvalues next.
9. Calculate the correlation matrix
Eigenvalues are calculated from matrices, so our first step is to calculate the correlation matrix. Note that we are using the half of the dataset we separated out for the EFA. Also, the correlation matrix is calculated using observations that are pairwise complete.
10. Eigenvalues
Once you've got the correlation matrix, you can use it with eigen() to calculate the eigenvalues. The result is a list object containing several pieces of information. Check out the values element to view the eigenvalues.
A general rule is that eigenvalues greater than 1 represent meaningful factors. You can count these values in the results, but there's also a quick way to visualize this information.
11. Scree plots
You can visualize eigenvalues with the scree plot created by using the scree() function on a correlation matrix.
12. Scree plots
You'll get output like this, complete with a horizontal line to help you count the values greater than 1.
13. Let's practice!
Let's go try this out!