semEff
automates calculation of effects for structural
equation models (SEM) which use local estimation (‘piecewise’; see Lefcheck (2016)). Here’s a short example using
an SEM testing fire impacts on plant species richness (Grace & Keeley (2006); data provided with the
piecewiseSEM
package).
# install.packages(c("semEff", "piecewiseSEM"))
library(semEff)
data(keeley, package = "piecewiseSEM")
Let’s have a look at the data. Observations represent a range of vegetation parameters measured in 90 post-fire study plots after wildfires in southern California in 1993:
distance | elev | abiotic | age | hetero | firesev | cover | rich |
---|---|---|---|---|---|---|---|
53.40900 | 1225 | 60.67103 | 40 | 0.757065 | 3.50 | 1.0387974 | 51 |
37.03745 | 60 | 40.94291 | 25 | 0.491340 | 4.05 | 0.4775924 | 31 |
53.69565 | 200 | 50.98805 | 15 | 0.844485 | 2.60 | 0.9489357 | 71 |
53.69565 | 200 | 61.15633 | 15 | 0.690847 | 2.90 | 1.1949002 | 64 |
51.95985 | 970 | 46.66807 | 23 | 0.545628 | 4.30 | 1.2981890 | 68 |
51.95985 | 970 | 39.82357 | 24 | 0.652895 | 4.00 | 1.1734866 | 34 |
An SEM fit to these data in Grace & Keeley (2006) tests the influence of landscape vs. local factors in the response of plants to fire, or more specifically, the direct vs. indirect effects of landscape location (distance from coast) on plant species richness via other biotic and abiotic mediators (see the paper for further details). We can reproduce this SEM (roughly) in a piecewise fashion by fitting a separate linear model for each observed response variable:
keeley.sem <- list(
lm(age ~ distance, data = keeley),
lm(hetero ~ distance, data = keeley),
lm(abiotic ~ distance, data = keeley),
lm(firesev ~ age, data = keeley),
lm(cover ~ firesev, data = keeley),
lm(rich ~ distance + hetero + abiotic + cover, data = keeley)
)
And visualise it using piecewiseSEM:::plot.psem()
(wrapper for DiagrammeR::render_graph()
), which helps to
clarify the directions of causal pathways in the system:
piecewiseSEM:::plot.psem(
piecewiseSEM::as.psem(keeley.sem),
node_attrs = data.frame(shape = "rectangle", color = "black", fillcolor = "grey"),
layout = "tree"
)
Now, to calculate effects, we first bootstrap and save the standardised direct effects:
system.time(
keeley.sem.boot <- bootEff(keeley.sem, R = 100, seed = 13, parallel = "no")
)
#> user system elapsed
#> 7.61 0.08 7.69
Note that here we use only 100 reps to save time, where typically we should use a lot more for greater accuracy of confidence intervals (e.g. 1,000–10,000+). Using the bootstrapped estimates we can now calculate direct, indirect, and total effects:
(keeley.sem.eff <- semEff(keeley.sem.boot))
#>
#> Piecewise SEM with:
#> * 1 exogenous vs. 6 endogenous variable(s)
#> * 9 direct vs. 8 indirect effect(s)
#>
#> Variables:
#> Category Predictor Mediator Response Dir. Eff. Ind. Eff.
#> -------- --------- -------- -------- --------- ---------
#> distance | Exog. | x - - | - - |
#> age | Endog. | x x x | 1 0 |
#> hetero | Endog. | x x x | 1 0 |
#> abiotic | Endog. | x x x | 1 0 |
#> firesev | Endog. | x x x | 1 1 |
#> cover | Endog. | x x x | 1 2 |
#> rich | Endog. | - - x | 4 5 |
#>
#> Use summary() for effects and confidence intervals for endogenous variables.
#> See ?getEff() for extracting (unformatted) effects.
Let’s examine effects for the final response variable, plant species richness, which has four direct and five indirect paths leading to it:
summary(keeley.sem.eff, response = "rich")
#>
#> SEM direct, summed indirect, total, and mediator effects:
#>
#> Response 'rich' (6/6):
#> Effect Bias Std. Err. Lower CI Upper CI
#> ------ ------ --------- -------- --------
#> DIRECT: distance | 0.233 | 0.016 | 0.065 | 0.089 0.351 | *
#> hetero | 0.301 | -0.009 | 0.072 | 0.168 0.447 | *
#> abiotic | 0.219 | -0.014 | 0.078 | 0.080 0.401 | *
#> cover | 0.267 | -0.015 | 0.082 | 0.085 0.446 | *
#> | | | | |
#> INDIRECT: distance | 0.220 | -0.012 | 0.047 | 0.139 0.349 | *
#> age | -0.053 | 0.002 | 0.025 | -0.145 -0.012 | *
#> firesev | -0.117 | 0.007 | 0.045 | -0.224 -0.029 | *
#> | | | | |
#> TOTAL: distance | 0.453 | 0.004 | 0.045 | 0.340 0.524 | *
#> age | -0.053 | 0.002 | 0.025 | -0.145 -0.012 | *
#> hetero | 0.301 | -0.009 | 0.072 | 0.168 0.447 | *
#> abiotic | 0.219 | -0.014 | 0.078 | 0.080 0.401 | *
#> firesev | -0.117 | 0.007 | 0.045 | -0.224 -0.029 | *
#> cover | 0.267 | -0.015 | 0.082 | 0.085 0.446 | *
#> | | | | |
#> MEDIATORS: age | 0.015 | -0.001 | 0.009 | 0.004 0.056 | *
#> hetero | 0.104 | -0.003 | 0.031 | 0.046 0.186 | *
#> abiotic | 0.101 | -0.008 | 0.037 | 0.032 0.184 | *
#> firesev | -0.038 | 0.001 | 0.017 | -0.083 -0.008 | *
#> cover | -0.155 | 0.008 | 0.061 | -0.302 -0.037 | *
We can see that the standardised total effect of distance from coast on plant species richness is moderately high at 0.453 (CIs: 0.373, 0.526), and is split almost equally between direct (0.233) and indirect (0.220) effects (involving five mediators). The mediator effects listed here are the sum of all the indirect paths from all predictors which involve each mediator, and can be useful as a rough indicator of the overall importance and net direction of effect of each. hetero, abiotic, and cover appear to be relatively important, with cover having the strongest influence (-0.155). However, this considers all predictors in the system collectively – meaning that mediators are themselves counted as predictors. To focus only on the exogenous variable distance, let’s recalculate:
summary(
semEff(keeley.sem.boot, predictor = "distance"),
response = "rich"
)
#>
#> SEM direct, summed indirect, total, and mediator effects:
#>
#> Response 'rich' (6/6):
#> Effect Bias Std. Err. Lower CI Upper CI
#> ------ ------ --------- -------- --------
#> DIRECT: distance | 0.233 | 0.016 | 0.065 | 0.089 0.351 | *
#> | | | | |
#> INDIRECT: distance | 0.220 | -0.012 | 0.047 | 0.139 0.349 | *
#> | | | | |
#> TOTAL: distance | 0.453 | 0.004 | 0.045 | 0.340 0.524 | *
#> | | | | |
#> MEDIATORS: age | 0.015 | -0.001 | 0.009 | 0.004 0.056 | *
#> hetero | 0.104 | -0.003 | 0.031 | 0.046 0.186 | *
#> abiotic | 0.101 | -0.008 | 0.037 | 0.032 0.184 | *
#> firesev | 0.015 | -0.001 | 0.009 | 0.004 0.056 | *
#> cover | 0.015 | -0.001 | 0.009 | 0.004 0.056 | *
Now we can see that hetero and abiotic are the important mediators for distance, while the relative influence of cover is much lower – weakened due to being part of a single longer chain of effect (involving age and firesev). The total indirect effect of 0.220 can thus be broken down into three individual paths:
distance -> age -> firesev -> cover -> rich (0.015)
distance -> hetero -> rich (0.104)
distance -> abiotic -> rich (0.101)
This is only one potential way of analysing paths in this SEM. Depending on the size of the system and the particular research questions, any number of predictors, mediators and/or response variables can be investigated separately or collectively for direct vs. indirect effects. If there are multiple hypotheses (competing or complementary), these can all be tested in a single SEM by comparing the relative importance of different variables and pathways of effect.
NOTE: All effects presented here are standardised
unique effects (i.e. adjusted for multicollinearity, a.k.a. semipartial
correlations), which is the only way to fully partition effects in the
system. These will usually be a bit smaller than the unadjusted
standardised coefficients, since most variables are correlated to some
degree. If there are large differences between the two, consideration
should be given to the impact (and relevance) of multicollinearity in
the system. In this particular case, it’s minimal, with standard errors
of coefficients less than twice as large as they should be (checked
using RVIF()
for the species richness model):
To use original standardised coefficients instead, specify
unique.eff = FALSE
, which can be passed to stdEff()
,
bootEff()
,
and/or semEff()
.
These can be preferred if prediction is the primary goal rather than
inference, for comparison with the unique effects or with other
standardised coefficients, or for other reasons.