vignettes/Multiple_Comparisons.Rmd
Multiple_Comparisons.Rmd
In this vignette we illustrate how to correct p-values for multiple comparisons using the multcomp and emmeans packages.
We start by simulating some data for a binary longitudinal outcome:
set.seed(1234)
n <- 300 # number of subjects
K <- 4 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at K time points
DF <- data.frame(id = rep(seq_len(n), each = K),
time = gl(K, 1, n*K, labels = paste0("Time", 1:K)),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
betas <- c(-2.13, 1, rep(c(1.2, -1.2), K-1)) # fixed effects coefficients
D11 <- 1 # variance of random intercepts
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
We fit a mixed effects logistic regression for y
assuming random intercepts for the random-effects part, and the main
effects of sex
and time
for the fixed-effects
part.
fm <- mixed_model(fixed = y ~ sex + time, random = ~ 1 | id, data = DF,
family = binomial())
The uncorrected p-values for the 4 time points are give by the
summary()
method:
summary(fm)
#>
#> Call:
#> mixed_model(fixed = y ~ sex + time, random = ~1 | id, data = DF,
#> family = binomial())
#>
#> Data Descriptives:
#> Number of Observations: 1200
#> Number of Groups: 300
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -583.4797 1178.959 1201.182
#>
#> Random effects covariance matrix:
#> StdDev
#> (Intercept) 1.447826
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) -2.2865 0.2496 -9.1601 < 1e-04
#> sexfemale 0.8860 0.2447 3.6200 0.00029455
#> timeTime2 0.3066 0.2266 1.3534 0.17592327
#> timeTime3 -0.5053 0.2463 -2.0516 0.04020481
#> timeTime4 0.5632 0.2237 2.5181 0.01179888
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
To perform the pairwise comparisons and obtain corrected p-values, we
load the multcomp package and use the
glht()
function. Because no specific methods exist for
MixMod
object returned by mixed_model()
, we
need to specify the vcov.
and coef.
arguments
of glht()
, i.e.,
library("multcomp")
#> Loading required package: mvtnorm
#> Loading required package: survival
#> Loading required package: TH.data
#> Loading required package: MASS
#>
#> Attaching package: 'MASS'
#> The following object is masked from 'package:GLMMadaptive':
#>
#> negative.binomial
#>
#> Attaching package: 'TH.data'
#> The following object is masked from 'package:MASS':
#>
#> geyser
fm_mc <- glht(fm, linfct = mcp(time = "Tukey"),
vcov. = vcov(fm, "fixed"), coef. = fixef)
summary(fm_mc)
#>
#> Simultaneous Tests for General Linear Hypotheses
#>
#> Multiple Comparisons of Means: Tukey Contrasts
#>
#>
#> Fit: mixed_model(fixed = y ~ sex + time, random = ~1 | id, data = DF,
#> family = binomial())
#>
#> Linear Hypotheses:
#> Estimate Std. Error z value Pr(>|z|)
#> Time2 - Time1 == 0 0.3066 0.2266 1.353 0.52807
#> Time3 - Time1 == 0 -0.5053 0.2463 -2.052 0.16892
#> Time4 - Time1 == 0 0.5632 0.2237 2.518 0.05695 .
#> Time3 - Time2 == 0 -0.8119 0.2423 -3.351 0.00463 **
#> Time4 - Time2 == 0 0.2566 0.2165 1.185 0.63542
#> Time4 - Time3 == 0 1.0685 0.2406 4.441 < 0.001 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Adjusted p values reported -- single-step method)
We continue our illustration by including the interaction term
between sex
and time
, and we focus on the
difference between males and females for the various time points. We
start by fitting the model:
gm <- mixed_model(fixed = y ~ sex * time, random = ~ 1 | id, data = DF,
family = binomial())
To compute the estimated log odds for males and females at the
different time points we use the emmeans()
functions from
the emmeans package:
library("emmeans")
gm_mc <- emmeans(gm, ~ sex | time)
gm_mc
#> time = Time1:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -2.74 0.342 Inf -3.41 -2.067
#> female -1.21 0.255 Inf -1.71 -0.707
#>
#> time = Time2:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -1.68 0.275 Inf -2.21 -1.135
#> female -1.44 0.263 Inf -1.95 -0.920
#>
#> time = Time3:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -4.40 0.546 Inf -5.47 -3.331
#> female -1.39 0.261 Inf -1.90 -0.876
#>
#> time = Time4:
#> sex emmean SE df asymp.LCL asymp.UCL
#> male -1.34 0.261 Inf -1.85 -0.825
#> female -1.25 0.256 Inf -1.75 -0.749
#>
#> Results are given on the logit (not the response) scale.
#> Confidence level used: 0.95
The corresponding pairwise comparisons are performed by the
pairs()
function:
pairs(gm_mc)
#> time = Time1:
#> contrast estimate SE df z.ratio p.value
#> male - female -1.5305 0.410 Inf -3.735 0.0002
#>
#> time = Time2:
#> contrast estimate SE df z.ratio p.value
#> male - female -0.2400 0.366 Inf -0.655 0.5125
#>
#> time = Time3:
#> contrast estimate SE df z.ratio p.value
#> male - female -3.0126 0.586 Inf -5.145 <.0001
#>
#> time = Time4:
#> contrast estimate SE df z.ratio p.value
#> male - female -0.0851 0.355 Inf -0.240 0.8106
#>
#> Results are given on the log odds ratio (not the response) scale.
For additional examples in testing interactions with the
emmeans package check the vignette:
vignette("interactions", package = "emmeans")
.