The sommer package was developed to provide R users with a powerful and reliable multivariate mixed model solver for different genetic and non-genetic analyses in diploid and polyploid organisms. This package allows the user to estimate variance components for a mixed model with the advantages of specifying the variance-covariance structure of the random effects, specifying heterogeneous variances, and obtaining other parameters such as BLUPs, BLUEs, residuals, fitted values, variances for fixed and random effects, etc. The core algorithms of the package are coded in C++ using the Armadillo library to optimize dense matrix operations common in the derect-inversion algorithms. Although the vignette shows examples using the mmes function with the default direct inversion algorithm (henderson=FALSE) the Henderson’s approach can be faster when the number of records surpasses the number of coefficients to estimate and setting the henderson argument to TRUE can bring significant speed ups.
The purpose of this vignette is to show how to translate the syntax formula from lme4
models to sommer
models. Feel free to remove the comment marks from the lme4 code so you can compare the results.
This is the simplest model people use when a random effect is desired and the levels of the random effect are considered to have the same intercept.
# install.packages("lme4")
# library(lme4)
library(sommer)
data(DT_sleepstudy)
DT <- DT_sleepstudy
###########
## lme4
###########
# fm1 <- lmer(Reaction ~ Days + (1 | Subject), data=DT)
# summary(fm1) # or vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
# Random effects:
# Groups Name Variance Std.Dev.
# Subject (Intercept) 1378.2 37.12
# Residual 960.5 30.99
# Number of obs: 180, groups: Subject, 18
###########
## sommer
###########
fm2 <- mmes(Reaction ~ Days,
random= ~ Subject,
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Subject:mu:mu 1378.158 505.6990 2.725254 Positive
## units:mu:mu 960.458 107.0498 8.972063 Positive
This is the a model where you assume that the random effect has different intercepts based on the levels of another variable. In addition the ||
in lme4
assumes that slopes and intercepts have no correlation.
###########
## lme4
###########
# fm1 <- lmer(Reaction ~ Days + (Days || Subject), data=DT)
# summary(fm1) # or vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
# Random effects:
# Groups Name Variance Std.Dev.
# Subject (Intercept) 627.57 25.051
# Subject.1 Days 35.86 5.988
# Residual 653.58 25.565
# Number of obs: 180, groups: Subject, 18
###########
## sommer
###########
fm2 <- mmes(Reaction ~ Days,
random= ~ Subject + vsm(ism(Days), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Subject:mu:mu 627.54087 283.52939 2.213319 Positive
## Days:Subject:mu:mu 35.86008 14.53187 2.467686 Positive
## units:mu:mu 653.58305 76.72711 8.518281 Positive
Notice that Days is a numerical (not factor) variable.
This is the a model where you assume that the random effect has different intercepts based on the levels of another variable. In addition a single |
in lme4
assumes that slopes and intercepts have a correlation to be estimated.
###########
## lme4
###########
# fm1 <- lmer(Reaction ~ Days + (Days | Subject), data=DT)
# summary(fm1) # or # vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
# Random effects:
# Groups Name Variance Std.Dev. Corr
# Subject (Intercept) 612.10 24.741
# Days 35.07 5.922 0.07
# Residual 654.94 25.592
# Number of obs: 180, groups: Subject, 18
###########
## sommer
###########
fm2 <- mmes(Reaction ~ Days, # henderson=TRUE,
random= ~ covm( vsm(ism(Subject)) , vsm(ism(Days), ism(Subject)) ),
nIters = 200, data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Subject:Days:ran1:ran1 612.081194 288.75050 2.1197581 Positive
## Subject:Days:ran1:ran2 9.604358 46.68381 0.2057321 Unconstr
## Subject:Days:ran2:ran2 35.073186 14.78720 2.3718604 Positive
## units:mu:mu 654.939143 77.18332 8.4855011 Positive
cov2cor(fm2$theta[[1]])
## [,1] [,2]
## [1,] 1.00000000 0.06555053
## [2,] 0.06555053 1.00000000
Notice that this last model require a new function called covm() which creates the two random effects as before but now they have to be encapsulated in covm() instead of just added.
This is the a model where you assume that the random effect has different intercepts based on the levels of another variable but there’s not a main effect. The 0 in the intercept in lme4 assumes that random slopes interact with an intercept but without a main effect.
###########
## lme4
###########
# fm1 <- lmer(Reaction ~ Days + (0 + Days | Subject), data=DT)
# summary(fm1) # or vc <- VarCorr(fm1); print(vc,comp=c("Variance"))
# Random effects:
# Groups Name Variance Std.Dev.
# Subject Days 52.71 7.26
# Residual 842.03 29.02
# Number of obs: 180, groups: Subject, 18
###########
## sommer
###########
fm2 <- mmes(Reaction ~ Days,
random= ~ vsm(ism(Days), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Days:Subject:mu:mu 52.70946 19.09984 2.759681 Positive
## units:mu:mu 842.02736 93.84640 8.972399 Positive
One of the strengths of sommer is the availability of other variance covariance structures. In this section we show 4 models available in sommer that are not available in lme4 and might be useful.
library(orthopolynom)
## diagonal model
fm2 <- mmes(Reaction ~ Days,
random= ~ vsm(dsm(Daysf), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Daysf:Subject:0:0 139.5473 399.5095 0.3492967 Positive
## Daysf:Subject:1:1 196.8544 411.8262 0.4780037 Positive
## Daysf:Subject:2:2 0.0000 365.3178 0.0000000 Positive
## Daysf:Subject:3:3 556.0773 501.2665 1.1093445 Positive
## Daysf:Subject:4:4 855.2104 581.8190 1.4698910 Positive
## Daysf:Subject:5:5 1699.4269 820.4561 2.0713197 Positive
## Daysf:Subject:6:6 2910.8975 1175.7872 2.4757011 Positive
## Daysf:Subject:7:7 1539.6201 779.1437 1.9760413 Positive
## Daysf:Subject:8:8 2597.5337 1089.4522 2.3842568 Positive
## Daysf:Subject:9:9 3472.7108 1351.5702 2.5693899 Positive
## units:mu:mu 879.6958 247.4680 3.5547862 Positive
## unstructured model
fm2 <- mmes(Reaction ~ Days,
random= ~ vsm(usm(Daysf), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Daysf:Subject:0:0 404.2274 572.9406 0.7055311 Positive
## Daysf:Subject:0:1 1024.8277 395.1858 2.5932810 Unconstr
## Daysf:Subject:1:1 418.9669 288.1527 1.4539753 Positive
## Daysf:Subject:0:2 542.0553 398.5048 1.3602229 Unconstr
## Daysf:Subject:1:2 830.0421 412.4896 2.0122739 Unconstr
## Daysf:Subject:2:2 0.0000 518.4224 0.0000000 Positive
## Daysf:Subject:0:3 800.6213 567.2558 1.4113938 Unconstr
## Daysf:Subject:1:3 1139.4846 491.8231 2.3168586 Unconstr
## Daysf:Subject:2:3 1058.4964 578.5524 1.8295602 Unconstr
## Daysf:Subject:3:3 761.4044 692.1520 1.1000538 Positive
## Daysf:Subject:0:4 759.8438 522.0343 1.4555437 Unconstr
## Daysf:Subject:1:4 1041.4806 326.7026 3.1878552 Unconstr
## Daysf:Subject:2:4 912.2462 445.2663 2.0487653 Unconstr
## Daysf:Subject:3:4 1592.3784 448.6967 3.5488967 Unconstr
## Daysf:Subject:4:4 957.7023 549.5803 1.7426068 Positive
## Daysf:Subject:0:5 935.2038 585.0229 1.5985763 Unconstr
## Daysf:Subject:1:5 1182.0541 494.0087 2.3927801 Unconstr
## Daysf:Subject:2:5 861.0084 594.1198 1.4492167 Unconstr
## Daysf:Subject:3:5 1675.4551 705.2855 2.3755700 Unconstr
## Daysf:Subject:4:5 2005.1568 510.5543 3.9274112 Unconstr
## Daysf:Subject:5:5 2069.9991 386.7921 5.3517102 Positive
## Daysf:Subject:0:6 668.3500 378.6722 1.7649827 Unconstr
## Daysf:Subject:1:6 853.0252 441.5719 1.9317924 Unconstr
## Daysf:Subject:2:6 917.6183 504.8746 1.8175175 Unconstr
## Daysf:Subject:3:6 1787.8367 426.9689 4.1872761 Unconstr
## Daysf:Subject:4:6 2079.1716 487.9486 4.2610462 Unconstr
## Daysf:Subject:5:6 2605.7361 551.2976 4.7265505 Unconstr
## Daysf:Subject:6:6 3124.3139 437.2096 7.1460324 Positive
## Daysf:Subject:0:7 934.8337 567.6481 1.6468543 Unconstr
## Daysf:Subject:1:7 929.1933 665.6637 1.3958900 Unconstr
## Daysf:Subject:2:7 925.8402 752.0828 1.2310348 Unconstr
## Daysf:Subject:3:7 1284.6133 584.4937 2.1978223 Unconstr
## Daysf:Subject:4:7 1551.3176 719.2232 2.1569347 Unconstr
## Daysf:Subject:5:7 1943.7604 799.3245 2.4317539 Unconstr
## Daysf:Subject:6:7 2307.7377 364.2914 6.3348672 Unconstr
## Daysf:Subject:7:7 1670.4190 739.9904 2.2573523 Positive
## Daysf:Subject:0:8 922.9089 823.1582 1.1211806 Unconstr
## Daysf:Subject:1:8 1047.3727 644.6257 1.6247766 Unconstr
## Daysf:Subject:2:8 833.2198 806.9536 1.0325498 Unconstr
## Daysf:Subject:3:8 1609.3713 894.2295 1.7997295 Unconstr
## Daysf:Subject:4:8 2031.1247 554.0902 3.6656935 Unconstr
## Daysf:Subject:5:8 3061.0333 1036.7583 2.9525042 Unconstr
## Daysf:Subject:6:8 2929.9489 812.7286 3.6050765 Unconstr
## Daysf:Subject:7:8 2435.3269 1095.3333 2.2233661 Unconstr
## Daysf:Subject:8:8 2949.0476 1198.3224 2.4609801 Positive
## Daysf:Subject:0:9 1443.6078 1049.7042 1.3752520 Unconstr
## Daysf:Subject:1:9 1517.7329 952.6090 1.5932380 Unconstr
## Daysf:Subject:2:9 969.8886 1179.0801 0.8225808 Unconstr
## Daysf:Subject:3:9 1745.3735 1186.7845 1.4706743 Unconstr
## Daysf:Subject:4:9 2200.7039 612.3551 3.5938363 Unconstr
## Daysf:Subject:5:9 3240.0574 959.0399 3.3784387 Unconstr
## Daysf:Subject:6:9 2213.3201 1029.3177 2.1502788 Unconstr
## Daysf:Subject:7:9 2401.9426 845.6411 2.8403806 Unconstr
## Daysf:Subject:8:9 3850.1122 1393.5787 2.7627519 Unconstr
## Daysf:Subject:9:9 3948.8494 1229.9178 3.2106611 Positive
## units:mu:mu 884.1034 579.1145 1.5266470 Positive
## random regression (legendre polynomials)
fm2 <- mmes(Reaction ~ Days,
random= ~ vsm(dsm(leg(Days,1)), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Days:Subject:leg0:leg0 2817.4048 1011.23903 2.786092 Positive
## Days:Subject:leg1:leg1 473.4608 199.53635 2.372805 Positive
## units:mu:mu 654.9433 77.18822 8.485016 Positive
## unstructured random regression (legendre)
fm2 <- mmes(Reaction ~ Days,
random= ~ vsm(usm(leg(Days,1)), ism(Subject)),
data=DT, tolParInv = 1e-6, verbose = FALSE)
summary(fm2)$varcomp
## VarComp VarCompSE Zratio Constraint
## Days:Subject:leg0:leg0 2817.4056 1011.24151 2.786086 Positive
## Days:Subject:leg0:leg1 869.9595 381.02552 2.283205 Unconstr
## Days:Subject:leg1:leg1 473.4608 199.53619 2.372807 Positive
## units:mu:mu 654.9429 77.18771 8.485067 Positive
Covarrubias-Pazaran G. 2016. Genome assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11(6):1-15.
Covarrubias-Pazaran G. 2018. Software update: Moving the R package sommer to multivariate mixed models for genome-assisted prediction. doi: https://doi.org/10.1101/354639
Bernardo Rex. 2010. Breeding for quantitative traits in plants. Second edition. Stemma Press. 390 pp.
Gilmour et al. 1995. Average Information REML: An efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440-1450.
Henderson C.R. 1975. Best Linear Unbiased Estimation and Prediction under a Selection Model. Biometrics vol. 31(2):423-447.
Kang et al. 2008. Efficient control of population structure in model organism association mapping. Genetics 178:1709-1723.
Lee, D.-J., Durban, M., and Eilers, P.H.C. (2013). Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Computational Statistics and Data Analysis, 61, 22 - 37.
Lee et al. 2015. MTG2: An efficient algorithm for multivariate linear mixed model analysis based on genomic information. Cold Spring Harbor. doi: http://dx.doi.org/10.1101/027201.
Maier et al. 2015. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am J Hum Genet; 96(2):283-294.
Rodriguez-Alvarez, Maria Xose, et al. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics 23 (2018): 52-71.
Searle. 1993. Applying the EM algorithm to calculating ML and REML estimates of variance components. Paper invited for the 1993 American Statistical Association Meeting, San Francisco.
Yu et al. 2006. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Genetics 38:203-208.
Tunnicliffe W. 1989. On the use of marginal likelihood in time series model estimation. JRSS 51(1):15-27.