1 Regression as Mean- and Covariance-Structure Analysis

This chapter prepares the reader to learn about structural equation modeling (SEM) by reviewing the fundamentals of ordinary least-squares (OLS) regression in the general(ized) linear modeling (GLM) framework. Different types of data are discussed, and some review of matrix algebra will be incorporated into the regression review.

1.1 What Are Mean and Covariance Structures?

Another (older) name for an SEM is covariance structure analysis. What is meant by this?

Covariance is the linear relationship we observe between 2 variables (more on this below). There might be many reasons why 2 variables covary, such as

one variable influences the other
common causes influence them both
both variables appear in a chain or web of causal effects

A covariance can be partitioned into components that are due to these different reasons. The decomposition of covariances will be explained after introducing path analysis, but recall how many statistical procedures you may already have learned about, which partition variance into components:

Regression: explained (\(R^2\)) vs. unexplained (residual) variance
ANOVA: variance between vs. within groups
Multilevel modeling: Level-1 v. -2 variance
- and (un)explained at each level
Between vs. within multiple imputations of missing data

Regression models in the GLM framework primarily analyze how the mean and variance of the outcome are structured.

A grand mean (\(\bar{y}\)) is an unconditional expected value, whereas a predicted value from a regression model (\(\hat{y}\)) is an expected value under a particular condition (e.g., \(\hat{y}=\beta_0\) when \(X=0\)). We calculate these conditional expectations as the intercept (\(\beta_0\)) plus each predictor times its slope: \(\sum \beta_j x_j\).
Variance is partitioned into (un)explained variance (e.g., \(R^2\)), and its structure can be further explored via unique contributions of each predictor (e.g., partial \(\eta^2\) in AN(C)OVA).

SEM is a multivariate method capable of partitioning means and variances of each variable, as well as partitioning the covariance between any pair of variables.

1.1.1 Types of Data Used in SEM

SEM can be seen as a generalization of the GLM for multivariate data. But rather than merely allowing multiple predictors and multiple outcomes, any variable can operate both as a predictor and an outcome in the same model. This allows us to model complex relationships among variables, including chains of causation (i.e., mediation).

Estimation of a GLM involves minimizing discrepancies (differences) between observed (\(y\)) and expected (\(\hat{y}\)) casewise values (i.e., an individual subject’s score on an outcome variable \(y\)). These discrepancies are called “residuals” (\(y - \hat{y}\)). In contrast, estimation of a SEM involves minimizing discrepancies between observed and expected summary statistics (i.e., the modeled variables’ means and covariance matrix, see table below). This is why another (older) name for an SEM is covariance structure analysis (when only trying to explain why variables are related to each other), or mean and covariance structure (MACS) analysis when means are also modeled.

	Casewise observations (GLM)	Mean vector (SEM)	Covariance matrix (SEM)
Observations	\(y_i\)	\(\bar{y}\)	\(\mathbf{S}\)
Expectations	\(\hat{y}_i\)	\(\widehat{\mu}\)	\(\widehat{\Sigma}\)
Residuals	\(y_i - \hat{y}_i\)	\(\bar{y} - \widehat{\mu}\)	\(\mathbf{S} - \widehat{\Sigma}\)

Summary statistics can be calculated from observed raw data, so raw data can be provided directly to SEM software for analysis. But if data are complete and come from a multivariate normal distribution, summary statistics can also be passed directly to the SEM software, which has advantages. Unlike raw data, summary statistics do not compromise the privacy of the research participants, so summary statistics can be reported in published empirical research. That allows research-consumers (like you) to have access to the same summary statistics to replicate the results or to fit a different model (e.g., that represents a competing theory). Thus, this chapter will show how to import both raw and summary data, as well as how to calculate summary statistics from raw data so that they can be reported along with results. First, we provide some information about interpreting the summary statistics most relevant for SEM.

1.1.2 Interpreting Covariance

Whether in a GLM or as part of a larger SEM, a regression model includes both mean-structure parameters (intercepts) and covariance-structure parameters (slopes, residual variance). Generally speaking, covariance-structure parameters are informative about how variables are related to each other, and even the variances of (and covariances among) predictors are pivotal to estimating regression slopes. In GLM, this is not immediately apparent from regression output, so later in this chapter we will demonstrate how to manually calculate regression slopes from raw data, revealing the role of summary statistics in estimating slopes. But first, what is covariance?

Recall that a variable’s variance (\(\sigma^2_y\)) is the mean of squared deviations from \(\bar{y}\), which quantifies how individual scores vary:

\[\text{Var}(y) = \frac{\sum_{i=1}^N (y_i - \bar{y})^2}{N-1} = \frac{\sum_{i=1}^N (y_i - \bar{y})(y_i - \bar{y})}{N-1}\]

Covariance (\(\sigma_{xy}\)) quantifies how \(x\) and \(y\) are linearly related (covary means “vary together”).

\[\text{Cov}(x,y) = \frac{\sum_{i=1}^N (x_i - \bar{x})(y_i - \bar{y})}{N-1}\]

Thus, a variance can be seen as a special case of covariance, when \(x=y\).

Its absolute value is difficult to interpret because it is in a squared metric. For a variance, we usually overcome this limitation by interpreting its square-root: the standard deviation (\(\sigma\), or sample estimate SD), expressed in the original units of measurement. In contrast, the square-root of a covariance does not have a meaningful or practical interpretation.

However, the covariance between 2 variables cannot exceed the product of their SDs (\(\pm \sigma_x \sigma_y\)), so we can standardize a covariance via division of \(\sigma_{xy}\) by \(\sigma_x \sigma_y\). This limits the range to \(\pm 1\), and we call the standardized covariance a correlation coefficient:

\[\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \sigma_y}\]

For \(>2\) variables in a data set \(\mathbf{Y}\), we can simultaneously calculate all (co)variances using matrix algebra:

Mean-center the data-matrix
- \(\mathbf{Y} - \bar{\mathbf{Y}}\)
“Square” it by (pre)multiplying it by its transpose
Divide the result by (1 minus) the sample size

\[\Sigma = \frac{1}{N-1} (\mathbf{Y} - \bar{\mathbf{Y}})^{'} (\mathbf{Y} - \bar{\mathbf{Y}})\]

Each off-diagonal cell of \(\Sigma\) contains the covariance between the row-variable and the column-variable. A covariance matrix is symmetric above/below the diagonal because the covariance (or correlation) between \(x\) and \(y\) is the same as between \(y\) and \(x\). On the diagonal of \(\Sigma\), the row and column variables are the same—how does a variable covary with itself? The diagonal contains each variable’s variance. Some authors refer to \(\Sigma\) as a “variance–covariance matrix”, but that is redundant because variances appear on the diagonal by definition (see Cov(\(x,y\)) formula above). So the term “covariance matrix” is sufficiently descriptive.

Standardizing a covariance (i.e., scaling by both SDs) is only one way to facilitate its interpretation. If it makes sense to think of one variable \(x\) affecting another variable \(y\), then another way to scale the covariance would be to divide by variance of \(x\).

\[\beta_{yx} = \frac{\sigma_{xy}}{\sigma^2_x} = \frac{\sigma_{xy}}{\sigma_x \sigma_x}\]

This provides a familiar interpretation from simple regression of \(y\) on \(x\): the slope (\(\beta_{yx}\)) is interpreted as the change in \(y\) per unit \(x\). Recall that a standardized simple regression slope (\(\beta^*_{yx}\)) is the correlation \(\rho_{xy}\). The transformation below reveals why:

\[\begin{align*} \beta^*_{yx} &= \beta_{yx} \times \frac{\sigma_x}{\sigma_y} \\ &= \frac{\sigma_{xy}}{\sigma_x \sigma_x} \times \frac{\sigma_x}{\sigma_y} \\ &= \frac{\sigma_{xy}}{\sigma_x \sigma_y} (= \rho_{xy}) \end{align*},\]

where one pair of \(\sigma_x\) terms cancels out. This transformation is how slopes can be standardized even when there are multiple predictors. Partial regression slopes can also be calculated from summary statistics, but the formulas are more complicated because they take into account how the predictors are related to each other.

The take-home message is that (un)standardized slopes and (zero-order or partial) correlations are just different ways of scaling a covariance to make it more interpretable. Like covariances, correlations are agnostic about whether one variable affects another, but slopes imply a causal direction (although other conditions must be met to draw a valid causal inference).

1.2 Importing Data for SEM

This section covers importing both raw data and summary statistics, either of which can be analyzed by SEM software. We will utilize some special features in the open-source R package for SEM called lavaan (Rosseel, 2012), which is also the primary software we use for conducting SEM analyses in later chapters.

1.2.1 Installing `lavaan`

To install the lavaan package you can type the following command in the R console:

install.packages("lavaan", dependencies = TRUE)

Depending on your R(Studio) settings, this might open a window where you have to select a CRAN mirror (select “[your country], [closest city]”) prior to installing the package, including all additional packages that it depends on. You only need to install the package once (on a specific computer), but you must also “load” the library to access its functionality (similar to first installing an app, then opening the app to use it):

library(lavaan)

Every time you start R you need to use this command to activate the lavaan package into the current R workspace. Therefore, it is advisable to start every script with this command.

Sometimes, it will be necessary to install the development version of lavaan before the next version is made available on CRAN. For example, if the developer has fixed a bug or introduced a new feature in the source code (hosted on GitHub), the development version can be installed using the remotes package:

install.packages("remotes") # if necessary
remotes::install_github("yrosseel/lavaan")

1.2.2 Importing Raw Data

Typically, data are already stored in an external file with a predefined format, such as SPSS (*.sav) or Excel (*.xlsx), and there are some pre-specified functions in R that can read in data from such type of formats, although they require the installation of specific packages. As an alternative, you can store the data in a plain text (with .txt or .csv extensions), where each line represents the observed responses of one individual. Spaces, commas, semicolons or tabs can be used to separate the individual responses. Some programs (like SPSS) can also export data to these types of file formats.

read.table("data.txt", header = TRUE)
read.spss("data.sav", to.data.frame = TRUE) # requires the package foreign
read.xls("data.xls")    # requires the package gdata
as.data.frame(read_excel("data.xlsx"))  # requires the package readxl

This chapter uses data from a tutorial published on the Social Change Lab’s web site.

dat <- foreign::read.spss("demoData/MODMED.sav", to.data.frame = TRUE)[c(2, 3, 5, 7)]
head(dat)

##   MOOD      NFC      POS      ATT
## 1   -1  3.07579   6.2500  18.5455
## 2   -1  2.59489  -5.0671 -16.9684
## 3   -1 -1.00952   0.9346  -5.9578
## 4   -1 -0.43824   2.8668  -7.2256
## 5   -1  0.21788 -16.5601 -26.7000
## 6   -1  0.43842 -13.1365 -24.1241

MOOD indicates experimental conditions

\(+1\) = positive mood induction
\(-1\) = neutral mood (control condition)

NFC = need for cognition, POS = positive thoughts, and ATT = attitude

1.2.3 Calculate Summary Statistics from Raw Data

To report summary statistics of your modeled variables, they can be calculated using the colMeans() function for means, the cov() function for a covariance matrix, and the nrow() function for the sample size

Note: Counting the number of rows assumes you have complete data

(M <- colMeans(dat)) # mean vector

##       MOOD        NFC        POS        ATT 
##  0.0000000 -0.0000002  0.0000040  1.9807230

(S <- cov(dat)) # covariance matrix

##             MOOD         NFC        POS        ATT
## MOOD  1.01010101 -0.03243879  4.3546465   6.841187
## NFC  -0.03243879  1.97289018  0.7953901   2.598788
## POS   4.35464646  0.79539011 69.2629723  87.914121
## ATT   6.84118687  2.59878834 87.9141208 282.059756

(N <- nrow(dat)) # complete-data sample size

## [1] 100

Covariances are explained in more detail in a later section, but it is sufficient here to understand that the covariance between variables \(x\) and \(y\) (i.e., \(\sigma_{xy}\)) is their unstandardized correlation coefficient (\(r_{xy}\)). That is, a correlation is a covariance between \(z\) scores. Because correlations are easier to interpret than covariances, it is common to report the means, SDs, and correlations. Using the SDs, readers can rescale correlations to covariances in the orignal units of measurement (i.e., \(\sigma_{xy} = \sigma_x \times \sigma_y \times r_{xy}\)). The cov2cor() function can standardize a covariance matrix, or you can use the cor() function directly.

(SD <- sapply(dat, sd))

##      MOOD       NFC       POS       ATT 
##  1.005038  1.404596  8.322438 16.794635

sqrt(diag(S)) # SDs == square-roots of variances

##      MOOD       NFC       POS       ATT 
##  1.005038  1.404596  8.322438 16.794635

cov2cor(S) # standardize the covariance matrix, or use cor()

##             MOOD         NFC        POS       ATT
## MOOD  1.00000000 -0.02297898 0.52061891 0.4053018
## NFC  -0.02297898  1.00000000 0.06804217 0.1101663
## POS   0.52061891  0.06804217 1.00000000 0.6289810
## ATT   0.40530176  0.11016633 0.62898098 1.0000000

(R <- cor(dat))

##             MOOD         NFC        POS       ATT
## MOOD  1.00000000 -0.02297898 0.52061891 0.4053018
## NFC  -0.02297898  1.00000000 0.06804217 0.1101663
## POS   0.52061891  0.06804217 1.00000000 0.6289810
## ATT   0.40530176  0.11016633 0.62898098 1.0000000

Because covariance and correlation matrices are symmetric (and diagonal elements of a correlation matrix are 1 by definition), both could be reported in a single table. For example, the upper triangle could be correlations, and the lower triangle (including the diagonal) could be (co)variances.

printS <- S
printS[upper.tri(R)] <- R[upper.tri(R)]
printS # not symmetric, correlations in upper triangle

##             MOOD         NFC         POS         ATT
## MOOD  1.01010101 -0.02297898  0.52061891   0.4053018
## NFC  -0.03243879  1.97289018  0.06804217   0.1101663
## POS   4.35464646  0.79539011 69.26297231   0.6289810
## ATT   6.84118687  2.59878834 87.91412077 282.0597561

1.2.3.1 Calculate Summary Statistics for Multiple Groups

In SEM, it is common to estimate model parameters simultaneously in multiple groups (i.e., multigroup SEM or MG-SEM). In our example data, the MOOD variable is a 2-group variable, so we could create a factor from the numeric codes:

dat$mood.f <- factor(dat$MOOD, levels = c(-1, 1),
                     labels = c("neutral","positive"))

Then we can calculate the summary statistics separately within each group.

modVars <- c("ATT","NFC","POS")   # modeled variables
neuRows <- dat$mood.f == "neutral"  # rows for neutral group
posRows <- dat$mood.f == "positive" # rows for positive group
gM <- list(neutral  = colMeans(dat[neuRows, modVars] ),
           positive = colMeans(dat[posRows, modVars] ))
gS <- list(neutral  = cov(     dat[neuRows, modVars] ),
           positive = cov(     dat[posRows, modVars] ))
gN <- table(dat$mood.f)

To fit a MG-SEM in lavaan, the summary statistics must be a list with one (mean vector or covariance matrix) per group, as well as a vector of group sample sizes.

gM

## $neutral
##        ATT        NFC        POS 
## -4.7920520  0.0321142 -4.3110960 
## 
## $positive
##        ATT        NFC        POS 
##  8.7534980 -0.0321146  4.3111040

gS

## $neutral
##            ATT       NFC       POS
## ATT 245.020884  4.812796 61.799307
## NFC   4.812796  2.456274 -1.201971
## POS  61.799307 -1.201971 47.663230
## 
## $positive
##             ATT       NFC       POS
## ATT 231.2417228 0.8817017 56.235120
## NFC   0.8817017 1.5276643  3.091531
## POS  56.2351197 3.0915313 54.346483

gN

## 
##  neutral positive 
##       50       50

1.2.4 Importing Summary Data

1.2.4.1 Full Covariance Matrix

If we are importing summary data (e.g., from an article that reported them), we can type our data directly to our R script. Suppose the values above from S were rounded to 3 decimal places:

covVals <- c( 1.010, -0.032,  4.355,   6.841, 
             -0.032,  1.973,  0.795,   2.599, 
              4.355,  0.795, 69.263,  87.914,
              6.841,  2.599, 87.914, 282.060)
covVals

##  [1]   1.010  -0.032   4.355   6.841  -0.032   1.973   0.795   2.599   4.355   0.795  69.263
## [12]  87.914   6.841   2.599  87.914 282.060

If these values were saved as plain text, we could also read the data from a file using the scan() function, which returns a vector that contains the stored values in the file "values.txt".

covVals <- scan("values.txt")

Either way, we can place these values from the full covariance matrix into a matrix using the matrix() function, specifying how many rows and columns there are:

(newS <- matrix(data = covVals, nrow = 4, ncol = 4))

##        [,1]   [,2]   [,3]    [,4]
## [1,]  1.010 -0.032  4.355   6.841
## [2,] -0.032  1.973  0.795   2.599
## [3,]  4.355  0.795 69.263  87.914
## [4,]  6.841  2.599 87.914 282.060

We can give names to the variables in the covariance matrix by using the dimnames= argument of the matrix() function, or by using the dimnames() function after creating the matrix. Let’s save the variable names in an object obsnames, then use them to assign names.

obsnames <- c("MOOD", "NFC", "POS", "ATT")
## providing names when creating matrix
newS <- matrix(covVals, nrow = 4, ncol = 4,
               dimnames = list(obsnames, obsnames))
## or assign afterward, all at once
dimnames(newS) <- list(obsnames, obsnames) # (rows, columns)
## or one dimension at a time (useful for asymmetric matrix)
rownames(newS) <- obsnames
colnames(newS) <- obsnames

1.2.4.1.1 Rescaling a Correlation Matrix

If the imported matrix is instead a full correlation matrix (i.e., assuming all variables have \(SD=1\)), then it can be transformed back into a covariance matrix using the \(SD\)s, so variables will be in their original units. This is important because an SEM fitted to a correlation matrix will have biased \(SE\)s and inflated Type I error rates unless complex constraints are imposed.

## store values from correlation matrix
corVals <- c(  1,    -0.023, 0.521, 0.405,
              -0.023, 1,     0.068, 0.110,
               0.521, 0.068, 1,     0.629, 
               0.405, 0.110, 0.629, 1)
newR <- matrix(data = corVals, nrow = 4, ncol = 4,
               dimnames = list(obsnames, obsnames))
newR

##        MOOD    NFC   POS   ATT
## MOOD  1.000 -0.023 0.521 0.405
## NFC  -0.023  1.000 0.068 0.110
## POS   0.521  0.068 1.000 0.629
## ATT   0.405  0.110 0.629 1.000

## store SDs as a diagonal matrix
(SDs <- diag(SD))

##          [,1]     [,2]     [,3]     [,4]
## [1,] 1.005038 0.000000 0.000000  0.00000
## [2,] 0.000000 1.404596 0.000000  0.00000
## [3,] 0.000000 0.000000 8.322438  0.00000
## [4,] 0.000000 0.000000 0.000000 16.79463

## transform correlations to covariances
scaled.S <- SDs %*% newR %*% SDs
round(scaled.S, 3)

##        [,1]   [,2]   [,3]    [,4]
## [1,]  1.010 -0.032  4.358   6.836
## [2,] -0.032  1.973  0.795   2.595
## [3,]  4.358  0.795 69.263  87.917
## [4,]  6.836  2.595 87.917 282.060

## matches covariance matrix (within rounding error)
newS

##        MOOD    NFC    POS     ATT
## MOOD  1.010 -0.032  4.355   6.841
## NFC  -0.032  1.973  0.795   2.599
## POS   4.355  0.795 69.263  87.914
## ATT   6.841  2.599 87.914 282.060

Note that rescaling the correlation matrix to be a covariance matrix required pre- and postmultipication of the (diagnal matrix of) \(SD\)s. The operator for matrix multiplication is %*% (the normal scalar multiplication operator * would simply multiply each cell of one matrix with the corresponding cell of another matrix). The table below gives an overview of commonly used functions in matrix algebra in R.

Symbol	Function	Example in R
\(A+B\)	Addition	`A + B`
\(A-B\)	Subtraction	`A - B`
(none)	Elementwise Multiplication (in R)	`A * B`
(none)	Elementwise Division (in R)	`A / B`
\(AB\)	Matrix Multiplication	`A %*% B`
\(A^{-1}\)	Inverse (enables “division” analog)	`solve(A)`
\(A^{'}\)	Transpose (rows become columns)	`t(A)`
\(\|A\|\)	Determinant (generalized variance)	`det(A)`

To review basic matrix algebra, consult a linear algebra text or an appendix of some SEM textbooks (e.g., Bollen, 1989; Schumacker & Lomax, 2016).

1.2.4.2 Upper/Lower Triangle of a Covariance Matrix

As a covariance matrix is a symmetric matrix, we do not need to provide the complete matrix. Instead, we can import only the nonredundant information from only the lower (or upper) triangle of the covariance matrix, then use the lavaan function getCov() to create the full matrix. By default, getCov() expects the values from the lower triangle, and it will read values row-by-row (i.e., left-to-right, top-to-bottom), including the diagonal elements (diagonal = TRUE is the default argument).

\[ \begin{bmatrix} 1.010 & & & \\ -0.032 & 1.973 & & \\ 4.355 & 0.795 & 69.263 & \\ 6.841 & 2.599 & 87.914 & 282.060 \end{bmatrix} \]

This is equivalent to reading the upper triangle column-by-column (i.e., top-to-bottom, left-to-right). One can also add a vector with the names= of the observed variables. It returns the complete covariance matrix, including row and column names. The following code can be used to create a full matrix that is equal to S and newS above, using only the values from the lower triangle of the matrix.

lowerS <- c( 1.010,
            -0.032, 1.973, 
             4.355, 0.795, 69.263, 
             6.841, 2.599, 87.914, 282.060)
getCov(x = lowerS, names = obsnames)

##        MOOD    NFC    POS     ATT
## MOOD  1.010 -0.032  4.355   6.841
## NFC  -0.032  1.973  0.795   2.599
## POS   4.355  0.795 69.263  87.914
## ATT   6.841  2.599 87.914 282.060

Sometimes you will find the upper triangle reported in a published paper instead of the lower triangle. Because getCov() only reads values row-by-row (i.e., left-to-right, top-to-bottom), reading an upper-triangle requires changing the argument to lower = FALSE.

\[ \begin{bmatrix} 1.010 & -0.032 & 4.355 & 6.841 \\ & 1.973 & 0.795 & 2.599 \\ & & 69.263 & 87.914 \\ & & & 282.060 \end{bmatrix} \]

upperS <- c(1.010, -0.032, 4.355,  6.841, 
                    1.973, 0.795,  2.599, 
                          69.263, 87.914,
                                 282.060)
getCov(x = upperS, names = obsnames, lower = FALSE)

##        MOOD    NFC    POS     ATT
## MOOD  1.010 -0.032  4.355   6.841
## NFC  -0.032  1.973  0.795   2.599
## POS   4.355  0.795 69.263  87.914
## ATT   6.841  2.599 87.914 282.060

1.2.4.2.1 Upper/Lower Triangle of a Correlation Matrix

Quite frequently, in published papers the covariance matrix is not provided, but instead a correlation matrix and \(SD\)s are given separately. The getCov() argument sds= can be used to automatically rescale the correlation matrix. Because the diagonal of a correlation matrix is always 1, it is not necessary to include the diagonal values.

\[ \begin{bmatrix} 1 & & & \\ -0.023 & 1 & & \\ 0.521 & 0.068 & 1 & \\ 0.405 & 0.110 & 0.629 & 1 \end{bmatrix} \]

In this case, tell lavaan that the diagonal entries are omitted using the diagonal=FALSE argument. The following syntax creates the complete covariance matrix that was used in the previous examples from the correlations and \(SD\)s.

getCov(x = c(-0.023, 0.521, 0.068, 0.405, 0.110, 0.629),
       diagonal = FALSE, sds = SD, names = obsnames)

##             MOOD         NFC        POS        ATT
## MOOD  1.01010101 -0.03246846  4.3578341   6.836093
## NFC  -0.03246846  1.97289018  0.7948971   2.594865
## POS   4.35783405  0.79489713 69.2629723  87.916779
## ATT   6.83609342  2.59486462 87.9167795 282.059756

If you are wondering whether the correlations appear in the correct order in the matrix, you can first leave the \(SD\)s out and check that the cells of the correlation matrix are in the correct order. If everything looks correct, then you can add the \(SD\)s. Covariance/correlation matrices are also symmetric, so it is also important to check that the lower triangle is a reflection of the upper triangle (i.e., Row-\(x\) Column-\(y\) of the matrix should contain the same value as Row-\(y\) Column-\(x\)). The R function isSymmetric() can perform this check for you:

isSymmetric(S)

## [1] TRUE

1.3 Regression Using Matrix Algebra

This section is more technical than most remaining chapters will be. The main goal of this section is to demonstrate how GLM parameters can be estimated using raw data and summary statistics, revealing how these approaches are equivalent. Familiarity with the basics of matrix algebra will continue to remain important throughout later chapters as well, not only because much of SEM is expressed in matrix notation, but because later chapters will demonstrate certain methods (e.g., standardizing an entire matrix of regression slopes) using matrix algebra.

1.3.1 Linear Regression Models

The GLM expresses an outcome \(y\) as a linear combination (weighted sum) of predictors:

\[\begin{align*} y_i &= \beta_0 + \beta_1 x_{i1} + \ldots + \beta_p x_{ip} &+& \varepsilon_i \\ &= \beta_0 + \sum_{j=1}^p \beta_j x_{ij} &+& \varepsilon_i \sim \mathcal{N}(0, \sigma) \end{align*}\]

The deterministic component of \(y\) can be predicted or explained by \(X\)
The stochastic component of \(y\) includes all other unmodeled effects (omitted variables) and sources of error (\(\varepsilon\))

This is analogous to a recipe for the outcome: How much of \(x_1\) and \(x_2\) do you need to recreate \(y\)?

The notation above uses a subscript i as an “index variable” for subject \(i = 1, \dots, N\). There are other notations, such as the full matrix notation showing each subject’s values of the outcome \(y\) and predictors \(j = 1, \dots, p\) in columns of the matrix \(\mathbf{X}\):

\[\begin{equation*} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_N \end{bmatrix} = \begin{bmatrix} 1 & x_{1,1} & x_{1,2} & \dots & x_{1,p} \\ 1 & x_{2,1} & x_{2,2} & \dots & x_{2,p} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{N,1} & x_{N,2} & \dots & x_{N,p} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \vdots \\ \beta_p \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \vdots \\ \varepsilon_N \end{bmatrix} \end{equation*}\]

The shorthand notation simply uses bolded symbols for the matrices above: \(\mathbf{y} = \mathbf{X} \mathbf{\beta} + \mathbf{\varepsilon}\)

The ordinary least-squares (OLS) estimator of \(\beta\) minimizes the sum of squared residuals (equivalent to maximizing \(R^2\)):

\[\beta = (\mathbf{X}^{'} \mathbf{X})^{-1} \mathbf{X}^{'} \mathbf{y}\]

The OLS estimator operates on the raw data: the matrix of predictors (\(\mathbf{X}\)) and the vector of outcomes (\(\mathbf{y}\)). But recall from the previous section that if the data are mean-centered, then \(\mathbf{X}^{'} \mathbf{X}\) is the covariance matrix of the predictors. In that case, there would be no vector of ones in the first column of \(\mathbf{X}\), and the first element of \(\beta\) (the intercept) can be omitted because it would be zero whenever \(\mathbf{X}\) and \(\mathbf{y}\) are mean-centered. The syntax examples below calculate \(\mathbf{\Sigma}_\mathbf{X}\) both manually and using the cov() function to demonstrate their equivalence.

## Manual calculation
n <- nrow(dat) - 1
Xc <- scale(dat[c("MOOD","NFC","POS")],
            center = TRUE, scale = FALSE)
t(Xc) %*% Xc / n

##             MOOD         NFC        POS
## MOOD  1.01010101 -0.03243879  4.3546465
## NFC  -0.03243879  1.97289018  0.7953901
## POS   4.35464646  0.79539011 69.2629723

## Automated function
cov(dat[c("MOOD","NFC","POS")])

##             MOOD         NFC        POS
## MOOD  1.01010101 -0.03243879  4.3546465
## NFC  -0.03243879  1.97289018  0.7953901
## POS   4.35464646  0.79539011 69.2629723

Likewise, \(\mathbf{X}^{'} \mathbf{y}\) captures the covariances of predictors with the outcome:

## Manual calculation
t(Xc) %*% dat$ATT  / n

##           [,1]
## MOOD  6.841187
## NFC   2.598788
## POS  87.914121

## Automated function
cov(x = dat[c("MOOD","NFC","POS")],
    y = dat$ATT)

##           [,1]
## MOOD  6.841187
## NFC   2.598788
## POS  87.914121

Thus, even without access to raw data, the OLS estimates of slopes can be obtained using the data’s summary statistics:

Xcov <- cov(dat[c("MOOD","NFC","POS")])
Xy <- cov(x = dat[c("MOOD","NFC","POS")],
          y = dat$ATT)
solve(Xcov) %*% Xy # calculate slopes from (co)variances

##           [,1]
## MOOD 1.8839147
## NFC  0.8883671
## POS  1.1406346

These slopes match the estimates from R’s built-in linear-modeling function:

mod <- lm(ATT ~ MOOD + NFC + POS, data = dat)
coef(mod)

## (Intercept)        MOOD         NFC         POS 
##   1.9807186   1.8839147   0.8883671   1.1406346

1.3.2 Intercepts and Means

The lm() output above also provides an intercept because the data were not centered. In this case, the matrix \(X\) includes a constant (all 1s) in the first column, so its covariance matrix no longer resembles the covariance matrix of the predictors alone. However, the covariation among predictors (and outcome) are still captured, and the column of 1s effectively “partials out” the mean from the predictors and outcome.

In the simplest case without any predictors (an intercept-only model), it is easier to see how the vector of 1s captures mean-structure information. Recall the formula for an arithmetic mean is the sum of \(y\) scores, divided by the number of \(y\) scores: \(\bar{y} = \frac{1}{N} \sum y\). For the intercept-only model, the intercept (\(\beta_0\)) is the mean:

\((\mathbf{X}^{'} \mathbf{X})^{-1} = (\mathbf{1}^{'} \mathbf{1})^{-1} = \frac{1}{N}\)
\(\mathbf{X}^{'} \mathbf{y} = \sum y\)

The syntax example below demonstrates the equivalence:

## using matrix algebra
y   <- dat$ATT
ONE <- rep(1, times = length(y))
solve(t(ONE) %*% ONE) %*% t(ONE) %*% y

##          [,1]
## [1,] 1.980723

## fitted linear model with only an intercept
coef(lm(y ~ 1))

## (Intercept) 
##    1.980723

## calculate the mean
mean(y)

## [1] 1.980723

Finally, we will apply the OLS formula (\(\beta = (\mathbf{X}^{'} \mathbf{X})^{-1} \mathbf{X}^{'} \mathbf{y}\)) to the full model with uncentered raw data (i.e., intercept included):

X <- cbind(`(Intercept)` = ONE, # first column is constant
           as.matrix(dat[c("MOOD","NFC","POS")]))
solve(t(X) %*% X) %*% t(X) %*% y # beta

##                  [,1]
## (Intercept) 1.9807186
## MOOD        1.8839147
## NFC         0.8883671
## POS         1.1406346

Which again matches results from the lm() function:

coef(mod)

## (Intercept)        MOOD         NFC         POS 
##   1.9807186   1.8839147   0.8883671   1.1406346

When we add a predictor to the model, the intercept becomes a conditional mean (i.e., the expected value when each predictor = 0) rather than the “marginal” or “grand” mean.

1.3.3 Categorical Predictors

Recall also that groups \(g = 1, \ldots, G\) can be represented using numeric codes, very often 1s, such as dummy codes. For example, a dummy code for Group \(g\) indicates whether someone belongs to Group \(g\) (1) or not (0). Alternative coding strategies include orthogonal contrast codes or effects coding, the latter of which was how our MOOD variable was coded to indicate experimental conditions:

\(+1\) = positive mood induction
\(-1\) = neutral mood (control condition)

table(dat$MOOD)

## 
## -1  1 
## 50 50

The algebra is identical, capturing how dummy codes covary with each other and with the outcome. However, the set of all \(G\) dummy codes would be perfectly multicollinear with the constant used to estimate the intercept (i.e., the column of 1s is exactly equal to the sum of the dummy-coded columns). This is why we typically estimate slopes for \(G-1\) dummy codes, treating the last one as the reference group. Thus, each dummy-code’s slope represents how that group’s mean differs from the reference group’s mean.

1.3.3.1 Regression with Group-Specific Intercepts

Rather than choosing a baseline group, we can omit the intercept from the model by dropping the vector of 1s from \(X\). This prevents multicollinearity among the dummy codes and the constant, so lm() will include all \(G\) dummy codes (not \(G-1\)).

dat$mood.f <- factor(dat$MOOD, levels = c(-1, 1),
                     labels = c("neutral","positive"))
coef( lm(POS ~ -1 + mood.f, data = dat) )

##  mood.fneutral mood.fpositive 
##      -4.311096       4.311104

Essentially, this model gives each group its own intercept, which is also the group-mean when there are no other predictors in the model

aggregate(POS ~ mood.f, data = dat, FUN = mean)

##     mood.f       POS
## 1  neutral -4.311096
## 2 positive  4.311104

Later in this course, you will learn about multigroup SEM, which is when the same SEM is fitted separately (but simultaneously) to 2 or more groups. Multigroup SEM is analogously advantageous, but (unlike in the GLM) variances can also differ across groups, so we do not need to assume homoskedasticity when using multigroup SEM.

Slopes can also be estimated separately per group when the intercept is dropped from the model. This is specified by multiplying the predictor by each dummy code.

## no "main/simple effect" of NFC without a reference group
mod1 <- lm(POS ~ -1 + mood.f + mood.f:NFC, data = dat)
coef(mod1)

##      mood.fneutral     mood.fpositive  mood.fneutral:NFC mood.fpositive:NFC 
##         -4.2953810          4.3760943         -0.4893472          2.0236980

This is equivalent to estimating the effect of NFC separately in each group.

mod.neu <- lm(POS ~ NFC, data = dat,
              subset = dat$mood.f == "neutral")
coef(mod.neu)

## (Intercept)         NFC 
##  -4.2953810  -0.4893472

mod.pos <- lm(POS ~ NFC, data = dat,
              subset = dat$mood.f == "positive")
coef(mod.pos)

## (Intercept)         NFC 
##    4.376094    2.023698

We will discuss (dis)advantages of the multigroup approach in the following chapter, after introducing regression models as SEM.

1.4 Summary

This chapter introduced the idea of analyzing mean and covariance structures, which partition descriptive statistics (means and (co)variances) into meaningful components. This enables researchers to impose a hypothesized structure on why variables are related to each other. The next chapter will introduce the SEM framework, which was designed for researchers to specify such theoretical structures as statistical models, and fit them to data in order to estimate and test parameters of interest.

References

Bollen, K. A. (1989). Structural equations with latent variables. Wiley. https://doi.org/10.1002/9781118619179

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02

Schumacker, R. E., & Lomax, R. G. (2016). A beginner’s guide to structural equation modeling (4th ed.). Lawrence Erlbaum. https://doi.org/10.4324/9781315749105