hypr
is a package for easy translation between
experimental (null) hypotheses, hypothesis matrices and contrast
matrices, as used for coding factor contrasts in linear regression
models. The package can be used to derive contrasts from hypotheses and
vice versa. The first step is to define the hypotheses. This step is
independent of the package per se and requires some theoretical
background knowledge in null hypothesis significance testing (NHST).
This vignette shows two examples of deriving contrasts and using them
for statistical analyses.
For a general introduction to hypr
, see the
hypr-intro
vignette:
vignette("hypr-intro", package = "hypr")
For the examples in this vignette, we are using a simulated dataset
with one factor X
with four levels X1
,
X2
, X3
, and X4
:
set.seed(123)
<- c(mu1 = 10, mu2 = 20, mu3 = 10, mu4 = 40) # condition means
M <- 5
N <- 10
SD <- do.call(rbind, lapply(names(M), function(x) {
simdat data.frame(X = x, DV = as.numeric(MASS::mvrnorm(N, unname(M[x]), SD^2, empirical = TRUE)))
}))$X <- factor(simdat$X)
simdat$id <- 1:nrow(simdat)
simdat simdat
## X DV id
## 1 mu1 0.7025204 1
## 2 mu1 4.7751377 2
## 3 mu1 26.8323215 3
## 4 mu1 8.4826319 4
## 5 mu1 9.2073885 5
## 6 mu2 35.1216132 6
## 7 mu2 24.3424125 7
## 8 mu2 9.5079228 8
## 9 mu2 14.4775279 9
## 10 mu2 16.5505235 10
## 11 mu3 24.3273310 11
## 12 mu3 10.8118074 12
## 13 mu3 11.4523075 13
## 14 mu3 6.9158660 14
## 15 mu3 -3.5073119 15
## 16 mu4 51.8888864 16
## 17 mu4 42.7533446 17
## 18 mu4 25.2877490 18
## 19 mu4 44.1955804 19
## 20 mu4 35.8744396 20
Assume we would like to test three treatments against a baseline. In a typical treatment contrast, we typically test whether any of the treatment conditions \(\mu_2\), \(\mu_3\) or \(\mu_4\) is significantly different from the baseline condition \(\mu_1\). Including the baseline intercept (testing the baseline against zero), this allows us to generate four null hypotheses:
\[\begin{align} H_{0_1}:& \; \mu_1 = 0 \\ H_{0_2}:& \; \mu_2 = \mu_1 \\ H_{0_3}:& \; \mu_3 = \mu_1 \\ H_{0_4}:& \; \mu_4 = \mu_1 \end{align}\]
The hypr()
function accepts any set of such equations as
comma-separated arguments:
<- hypr(mu1~0, mu2~mu1, mu3~mu1, mu4~mu1) trtC
When calling this function, a hypr
object named
trtC
is generated which contains all four hypotheses from
above as well as the hypothesis and contrast matrices derived from
those. We can display a summary like any other object in R:
trtC
## hypr object containing 4 null hypotheses:
## H0.1: 0 = mu1 (Intercept)
## H0.2: 0 = mu2 - mu1
## H0.3: 0 = mu3 - mu1
## H0.4: 0 = mu4 - mu1
##
## Call:
## hypr(~mu1, ~mu2 - mu1, ~mu3 - mu1, ~mu4 - mu1, levels = c("mu1",
## "mu2", "mu3", "mu4"))
##
## Hypothesis matrix (transposed):
## [,1] [,2] [,3] [,4]
## mu1 1 -1 -1 -1
## mu2 0 1 0 0
## mu3 0 0 1 0
## mu4 0 0 0 1
##
## Contrast matrix:
## [,1] [,2] [,3] [,4]
## mu1 1 0 0 0
## mu2 1 1 0 0
## mu3 1 0 1 0
## mu4 1 0 0 1
We can use this object to set the factor contrasts of X
in the simdat
dataframe:
contrasts(simdat$X) <- contr.hypothesis(trtC)
contrasts(simdat$X)
## [,1] [,2] [,3]
## mu1 0 0 0
## mu2 1 0 0
## mu3 0 1 0
## mu4 0 0 1
round(coef(summary(lm(DV ~ X, data=simdat))), 3)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10 4.472 2.236 0.040
## X1 10 6.325 1.581 0.133
## X2 0 6.325 0.000 1.000
## X3 30 6.325 4.743 0.000
The linear regression returns the expected estimates: The intercept is the baseline condition and the three main effects are the differences between the baseline and the three conditions.
A sum contrast, such as used for ANOVA, with four levels could generate the following null hypotheses:
\[\begin{align} H_{0_1}:& \; \mu_1 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \\ H_{0_2}:& \; \mu_2 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \\ H_{0_3}:& \; \mu_3 = \frac{\mu_1 + \mu_2 + \mu_3 + \mu_4}{4} \end{align}\]
We rewrite them into hypr
:
<- hypr(mu1 ~ (mu1+mu2+mu3+mu4)/4, mu2 ~ (mu1+mu2+mu3+mu4)/4, mu3 ~ (mu1+mu2+mu3+mu4)/4)
sumC sumC
## hypr object containing 3 null hypotheses:
## H0.1: 0 = (3*mu1 - mu2 - mu3 - mu4)/4
## H0.2: 0 = (3*mu2 - mu1 - mu3 - mu4)/4
## H0.3: 0 = (3*mu3 - mu1 - mu2 - mu4)/4
##
## Call:
## hypr(~3/4 * mu1 - 1/4 * mu2 - 1/4 * mu3 - 1/4 * mu4, ~3/4 * mu2 -
## 1/4 * mu1 - 1/4 * mu3 - 1/4 * mu4, ~3/4 * mu3 - 1/4 * mu1 -
## 1/4 * mu2 - 1/4 * mu4, levels = c("mu1", "mu2", "mu3", "mu4"
## ))
##
## Hypothesis matrix (transposed):
## [,1] [,2] [,3]
## mu1 3/4 -1/4 -1/4
## mu2 -1/4 3/4 -1/4
## mu3 -1/4 -1/4 3/4
## mu4 -1/4 -1/4 -1/4
##
## Contrast matrix:
## [,1] [,2] [,3]
## mu1 1 0 0
## mu2 0 1 0
## mu3 0 0 1
## mu4 -1 -1 -1
We next assign the contrast matrix to the factor X
:
contrasts(simdat$X) <- contr.hypothesis(sumC)
contrasts(simdat$X)
## [,1] [,2] [,3]
## mu1 1 0 0
## mu2 0 1 0
## mu3 0 0 1
## mu4 -1 -1 -1
Without creating the intermediate hypr
object, you can
also set the contrasts directly like this:
contrasts(simdat$X) <- contr.hypothesis(
~ (mu1+mu2+mu3+mu4)/4,
mu1 ~ (mu1+mu2+mu3+mu4)/4,
mu2 ~ (mu1+mu2+mu3+mu4)/4
mu3
)contrasts(simdat$X)
## [,1] [,2] [,3]
## mu1 1 0 0
## mu2 0 1 0
## mu3 0 0 1
## mu4 -1 -1 -1
Finally, we run the linear regression:
round(coef(summary(lm(DV ~ X, data=simdat))),3)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20 2.236 8.944 0.00
## X1 -10 3.873 -2.582 0.02
## X2 0 3.873 0.000 1.00
## X3 -10 3.873 -2.582 0.02