Autoscaling in lme4

Scaling predictors in a linear model (or extended linear models, such as LMMs or GLMMs) so that all of the predictors are on a similar scale (and thus, in general, the estimated parameters should have a similar scale) can improve the behaviour of various computational methods (see e.g. the “Remove eccentricity by scaling” section of Bolker et al. (2013)). lmer() and glmer() issue warnings suggesting that users consider re-scaling predictors when the predictor variables are on very different scales. Re-scaling predictors can improve both computation and interpretation (Schielzeth 2010) of (extended) linear models. However, in cases where researchers want to scale predictors for computational stability, but get parameter estimates on the scale of the original (unscaled) predictors, back-transforming the predictors, covariance matrix, etc., is tedious. lme4 now allows this scaling to be done automatically, behind the scenes.

An example of a warning message due to disparate predictor scales:

library(lme4)
#> Loading required package: Matrix

set.seed(1)
sleepstudy$var1 = runif(nrow(sleepstudy), 1e6, 1e7)

fit1 <- lmer(Reaction ~ var1 + Days + (Days | Subject), sleepstudy)
#> Warning: Some predictor variables are on very different scales: consider rescaling. 
#> You may also use (g)lmerControl(autoscale = TRUE) to improve numerical stability.

Instead of re-fitting with scaled predictors we can use a new (as of lme4 version >= 1.1.37) argument for (g)lmerControl, called autoscale, where scale() is automatically applied to the continuous covariates, but when delivered to the user, the coefficient estimates and the covariances are back-transformed. This maintains the original interpretation and minimizes user intervention.

Hence:

fit2 <- lmer(Reaction ~ var1 + Days + (Days | Subject), 
             control = lmerControl(autoscale = TRUE), sleepstudy)
all.equal(fixef(fit1), fixef(fit2))
#> [1] TRUE
all.equal(vcov(fit2), vcov(fit2))
#> [1] TRUE

The parameters are in fact slightly different (relative difference in fixed effects of \(4\times 10^{-11}\), and in covariance matrices of \(10^{-14}\)); these differences will be larger for less numerically stable models (i.e. more complicated or poorly data-constrained).

Autoscale Mechanism

The base function scale() is applied to the model matrix X as found in lFormula() or glFormula() from R/Modular.R.

To back transform, we use the scaled:center and scaled:scale attributes to reverse the changes found in fixef.merMod(), model.matrix.merMod(), and vcov.merMod() such that the summary() output shows the coefficients according to the original scale.

Reverting back the changes for fixef.merMod(), the \(\beta\) coefficients, is not too difficult. However, reverting said changes for the variance-covariance matrix is a bit more involved. The exact modification can be found from the function scale_vcov() (found in R/utilities.R), and the derivation is explained below.

Consider the following estimation of a simple linear model: \[ \widehat{Y} = \widehat{\beta}_{0} + \widehat{\beta}_{1} X^{*}, \] where \(X^{*}\) represents the scaled version of \(X\). That is, if we let \(C\) represent a vector that contains the values for centering and \(S\) represent a vector that contains the values for scaling, we have: \[ \widehat{Y} = \widehat{\beta}_{0} + \widehat{\beta}_{1} \left( \frac{X - C}{S} \right) \] \[ \widehat{Y} = \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}} + \sum_{i=1}^{p} \frac{\widehat{\beta}_{i} x_{i}}{s_{i}}. \] From the above, it is clear that the new intercept can be represented as: \[ \widehat{\beta}_{0}' = \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}. \] Similarly, the new coefficients are represented as: \[ \widehat{\beta}_{i}' = \frac{\widehat{\beta}_{i}}{s_{i}}. \] Then, the new variance-covariance matrix can be derived using the following: } \[ \textrm{Cov}\left( \frac{\widehat{\beta}_{i}}{s_{i}}, \frac{\widehat{\beta}_{j}}{s_{j}} \right) = \frac{1}{s_{i}s_{j}} \textrm{Cov}(\widehat{\beta}_{i}, \widehat{\beta_{j}}) = \frac{\sigma_{ij}^{2}}{s_{i}s_{j}} \] \[\begin{equation} \begin{split} \textrm{Cov}\left( \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}, \widehat{\beta}_{0} - \sum_{j=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}} \right) &= \textrm{Cov}(\widehat{\beta}_{0}, \widehat{\beta}_{0} ) - 2 \sum_{i=1}^{p} \frac{c_{i}}{s_{i}} \textrm{Cov}(\widehat{\beta}_{0} , \widehat{\beta}_{i}) + \sum_{i=1}^{p} \sum_{j=1}^{p} \frac{c_{i}c_{j}}{s_{i}s_{j}} \textrm{Cov}(\widehat{\beta}_{i}, \widehat{\beta}_{j}) \\ &= \sigma_{0}^{2} - 2 \sum_{i=1}^{p} \frac{c_{i}}{s_{i}} \sigma_{0i}^{2} + \sum_{i=1}^{p} \sum_{j=1}^{p} \frac{c_{i}c_{j}}{s_{i}s_{j}} \sigma_{ij}^{2} \end{split} \end{equation}\] \[\begin{equation} \begin{split} \textrm{Cov}\left( \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}, \frac{\widehat{\beta_{j}}}{s_{j}} \right) &= \textrm{Cov}\left( \widehat{\beta}_{0}, \frac{\widehat{\beta}_{j}}{s_{j}} \right) - \sum_{i=1}^{p} \frac{c_{i}}{s_{i}s_{j}} \textrm{Cov}\left( \widehat{\beta}_{i}, \widehat{\beta}_{j} \right) \\ &= \frac{\sigma_{0j}^{2}}{s_{j}} - \sum_{i=1}^{p} \frac{c_{i}}{s_{i}s_{j}} \sigma_{ij}^{2}, \end{split} \end{equation}\] for \(j = 1, 2, ..., p\).

References

Bolker, Benjamin M, Beth Gardner, Mark Maunder, Casper W Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for Fitting Nonlinear Ecological Models in r, AD Model Builder, and BUGS.” Methods in Ecology and Evolution 4 (6): 501–12.
Schielzeth, Holger. 2010. “Simple Means to Improve the Interpretability of Regression Coefficients: Interpretation of Regression Coefficients.” Methods in Ecology and Evolution 1 (2): 103–13. https://doi.org/10.1111/j.2041-210X.2010.00012.x.