Scaling predictors in a linear model (or extended linear models, such
as LMMs or GLMMs) so that all of the predictors are on a similar scale
(and thus, in general, the estimated parameters should have a similar
scale) can improve the behaviour of various computational methods (see
e.g. the “Remove eccentricity by scaling” section of Bolker et al. (2013)). lmer() and
glmer() issue warnings suggesting that users consider
re-scaling predictors when the predictor variables are on very different
scales. Re-scaling predictors can improve both computation and
interpretation (Schielzeth 2010) of
(extended) linear models. However, in cases where researchers want to
scale predictors for computational stability, but get parameter
estimates on the scale of the original (unscaled) predictors,
back-transforming the predictors, covariance matrix, etc., is tedious.
lme4 now allows this scaling to be done automatically,
behind the scenes.
An example of a warning message due to disparate predictor scales:
library(lme4)
#> Loading required package: Matrix
set.seed(1)
sleepstudy$var1 = runif(nrow(sleepstudy), 1e6, 1e7)
fit1 <- lmer(Reaction ~ var1 + Days + (Days | Subject), sleepstudy)
#> Warning: Some predictor variables are on very different scales: consider rescaling.
#> You may also use (g)lmerControl(autoscale = TRUE) to improve numerical stability.Instead of re-fitting with scaled predictors we can use a new (as of
lme4 version >= 1.1.37) argument for
(g)lmerControl, called autoscale, where
scale() is automatically applied to the continuous
covariates, but when delivered to the user, the coefficient estimates
and the covariances are back-transformed. This maintains the original
interpretation and minimizes user intervention.
Hence:
fit2 <- lmer(Reaction ~ var1 + Days + (Days | Subject),
control = lmerControl(autoscale = TRUE), sleepstudy)
all.equal(fixef(fit1), fixef(fit2))
#> [1] TRUE
all.equal(vcov(fit2), vcov(fit2))
#> [1] TRUEThe parameters are in fact slightly different (relative difference in fixed effects of \(4\times 10^{-11}\), and in covariance matrices of \(10^{-14}\)); these differences will be larger for less numerically stable models (i.e. more complicated or poorly data-constrained).
The base function scale() is applied to the model matrix
X as found in lFormula() or
glFormula() from R/Modular.R.
To back transform, we use the scaled:center and
scaled:scale attributes to reverse the changes found in
fixef.merMod(), model.matrix.merMod(), and
vcov.merMod() such that the summary() output
shows the coefficients according to the original scale.
Reverting back the changes for fixef.merMod(), the \(\beta\) coefficients, is not too difficult.
However, reverting said changes for the variance-covariance matrix is a
bit more involved. The exact modification can be found from the function
scale_vcov() (found in R/utilities.R), and the
derivation is explained below.
Consider the following estimation of a simple linear model: \[ \widehat{Y} = \widehat{\beta}_{0} + \widehat{\beta}_{1} X^{*}, \] where \(X^{*}\) represents the scaled version of \(X\). That is, if we let \(C\) represent a vector that contains the values for centering and \(S\) represent a vector that contains the values for scaling, we have: \[ \widehat{Y} = \widehat{\beta}_{0} + \widehat{\beta}_{1} \left( \frac{X - C}{S} \right) \] \[ \widehat{Y} = \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}} + \sum_{i=1}^{p} \frac{\widehat{\beta}_{i} x_{i}}{s_{i}}. \] From the above, it is clear that the new intercept can be represented as: \[ \widehat{\beta}_{0}' = \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}. \] Similarly, the new coefficients are represented as: \[ \widehat{\beta}_{i}' = \frac{\widehat{\beta}_{i}}{s_{i}}. \] Then, the new variance-covariance matrix can be derived using the following: } \[ \textrm{Cov}\left( \frac{\widehat{\beta}_{i}}{s_{i}}, \frac{\widehat{\beta}_{j}}{s_{j}} \right) = \frac{1}{s_{i}s_{j}} \textrm{Cov}(\widehat{\beta}_{i}, \widehat{\beta_{j}}) = \frac{\sigma_{ij}^{2}}{s_{i}s_{j}} \] \[\begin{equation} \begin{split} \textrm{Cov}\left( \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}, \widehat{\beta}_{0} - \sum_{j=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}} \right) &= \textrm{Cov}(\widehat{\beta}_{0}, \widehat{\beta}_{0} ) - 2 \sum_{i=1}^{p} \frac{c_{i}}{s_{i}} \textrm{Cov}(\widehat{\beta}_{0} , \widehat{\beta}_{i}) + \sum_{i=1}^{p} \sum_{j=1}^{p} \frac{c_{i}c_{j}}{s_{i}s_{j}} \textrm{Cov}(\widehat{\beta}_{i}, \widehat{\beta}_{j}) \\ &= \sigma_{0}^{2} - 2 \sum_{i=1}^{p} \frac{c_{i}}{s_{i}} \sigma_{0i}^{2} + \sum_{i=1}^{p} \sum_{j=1}^{p} \frac{c_{i}c_{j}}{s_{i}s_{j}} \sigma_{ij}^{2} \end{split} \end{equation}\] \[\begin{equation} \begin{split} \textrm{Cov}\left( \widehat{\beta}_{0} - \sum_{i=1}^{p} \frac{\widehat{\beta}_{i}c_{i}}{s_{i}}, \frac{\widehat{\beta_{j}}}{s_{j}} \right) &= \textrm{Cov}\left( \widehat{\beta}_{0}, \frac{\widehat{\beta}_{j}}{s_{j}} \right) - \sum_{i=1}^{p} \frac{c_{i}}{s_{i}s_{j}} \textrm{Cov}\left( \widehat{\beta}_{i}, \widehat{\beta}_{j} \right) \\ &= \frac{\sigma_{0j}^{2}}{s_{j}} - \sum_{i=1}^{p} \frac{c_{i}}{s_{i}s_{j}} \sigma_{ij}^{2}, \end{split} \end{equation}\] for \(j = 1, 2, ..., p\).