1. Model Specification
Hi, my name is Erin Buchanan. I am a statistician who studies the use of statistics and computational linguistics. We are going to begin learning about structural equation models, often abbreviated to SEM, and the lavaan package by starting with some important terminology and defining our first model.
2. SEM Goals
The main goal of SEM is to explore the relationship between variables, similar to regression analyses. Furthermore, SEM allows you to confirm if previously developed models fit well with new data, which often occurs after an exploratory analysis, like a factor analysis.
3. Variables
In SEM models, there are two types of variables. First, there are manifest variables, which are direct measurements in your dataset. For example, you can directly measure the longest number of sequential digits a person can accurately remember, or digit span. Manifest variables are represented by squares on this diagram. Second are latent variables, which are represented by circles on this diagram. This variable is the underlying abstract underlying phenomenon you are trying to measure indirectly through using the measurements from the manifest variables. The idea is that the manifest variables are indirect measures of the latent variables and therefore, the latent variable has a causal relationship to the values obtained with the manifest variables.
4. Example Model
In this example, we will use block design, digit span, and matrix reasoning subtests as manifest variables to indirectly measure the latent variable, intelligence. These three manifest variables are used to measure abstract reasoning, attention, and visual perception, which are believed to be core components of IQ.
5. Set Up Your Model
Let's create a one-factor model of the Holzinger and Swineford (1939) dataset. This dataset contains nine manifest variables measuring the mental ability of children covering visual tasks, writing, and word comprehension, and processing speed tasks. One-factor models have one latent variable with at least three manifest variables. We will create a model measuring a visual-speed factor using six of the nine manifest variables. X1, 2, and 3 are measurements of visual perception, while X7, 8, and 9 are speeded counting and addition tasks.
6. Set Up Your Model (2)
You will need to name your model, and a good habit is to use the word model to help you distinguish between the specified and fitted models. Second, you will name your latent variable any name you like, as long as it’s not a name in the dataset you loaded. An important symbol here is equals tilde, which indicates the direction of the prediction; this denotes that we expect the latent variable to predict the scores on the items. Last, you will use the names of the columns in the dataset to define the items related to the latent variable. These item names must match names in the data frame and be spelled exactly the same. You will use plus signs between them in a similar fashion to how a regression model is defined. The model is bookended by quotation marks, and you can use either single or double quotation marks.
7. Let's practice!
Let's set up some models to make sure you've got these key concepts.