In this vignette we illustrate the use of a number of basic generic
functions for models fitted by the mixed_model()
function
of package GLMMadaptive.
We start by simulating some data for a binary longitudinal outcome:
set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)
betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))
We continue by fitting the mixed effects logistic regression for
y
assuming random intercepts and random slopes for the
random-effects part.
fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
family = binomial())
As in the majority of model-fitting functions in R, the
print()
and summary()
methods display a short
and a detailed output of the fitted model, respectively. For
'MixMod'
objects we obtain
fm
#>
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF,
#> family = binomial())
#>
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Random effects covariance matrix:
#> StdDev Corr
#> (Intercept) 0.6707
#> time 0.2504 0.6773
#>
#> Fixed effects:
#> (Intercept) sexfemale time sexfemale:time
#> -2.054036030 -1.167231806 0.261163648 0.001917382
#>
#> log-Lik: -358.81
and
summary(fm)
#>
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF,
#> family = binomial())
#>
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100
#>
#> Model:
#> family: binomial
#> link: logit
#>
#> Fit statistics:
#> log.Lik AIC BIC
#> -358.81 731.62 749.8562
#>
#> Random effects covariance matrix:
#> StdDev Corr
#> (Intercept) 0.6707
#> time 0.2504 0.6773
#>
#> Fixed effects:
#> Estimate Std.Err z-value p-value
#> (Intercept) -2.0540 0.3135 -6.5522 < 1e-04
#> sexfemale -1.1672 0.4673 -2.4976 0.012505
#> time 0.2612 0.0562 4.6463 < 1e-04
#> sexfemale:time 0.0019 0.0782 0.0245 0.980437
#>
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#>
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE
The output is rather self-explanatory. However, just note that the
fixed-effects coefficients are on the linear predictor scale, and hence
are the corresponding log-odds for the intercept and log-odds ratios for
the rest of the parameters. The summary()
only shows the
estimated coefficients, standard errors and p-values, but no confidence
intervals. These can be separately obtained using the
confint()
method, i.e.,
exp(confint(fm))
#> 2.5 % Estimate 97.5 %
#> (Intercept) 0.06935892 0.1282164 0.2370198
#> sexfemale 0.12452870 0.3112273 0.7778321
#> time 1.16299270 1.2984401 1.4496624
#> sexfemale:time 0.85955354 1.0019192 1.1678646
By default the confidence intervals are produced for the fixed effects. Hence, taking the exp we obtain the confidence intervals for the corresponding odds-ratios. In addition, by default, the level of the confidence intervals is 95%. The following piece of code produces 90% confidence intervals for the variances of the random intercepts and slopes, and for their covariance:
confint(fm, parm = "var-cov", level = 0.90)
#> 5 % Estimate 95 %
#> var.(Intercept) 0.05214979 0.44984116 3.8803049
#> cov.(Int)_time -0.03350401 0.11374228 0.9571249
#> var.time 0.02225523 0.06269547 1.8129238
The estimated variance-covariance matrix of the maximum likelihood
estimates of all parameters is returned using the vcov()
method, e.g.,
vcov(fm)
#> (Intercept) sexfemale time sexfemale:time
#> (Intercept) 0.098274483 -0.074499996 -0.008216948 0.0061209157
#> sexfemale -0.074499996 0.218415104 0.005918094 -0.0175394670
#> time -0.008216948 0.005918094 0.003159419 -0.0029625166
#> sexfemale:time 0.006120916 -0.017539467 -0.002962517 0.0061144712
#> D_11 -0.087119578 -0.019450999 0.008223998 0.0018335034
#> D_12 0.016309404 0.003815306 -0.001720134 -0.0004513681
#> D_22 -0.086163936 -0.019241185 0.010700522 0.0001648898
#> D_11 D_12 D_22
#> (Intercept) -0.087119578 0.0163094042 -0.0861639364
#> sexfemale -0.019450999 0.0038153060 -0.0192411853
#> time 0.008223998 -0.0017201337 0.0107005218
#> sexfemale:time 0.001833503 -0.0004513681 0.0001648898
#> D_11 0.429031451 -0.0995215920 0.5054925071
#> D_12 -0.099521592 0.0369781665 -0.2113346392
#> D_22 0.505492507 -0.2113346392 1.3616186478
The elements of this covariance matrix that correspond to the
elements of the covariance matrix of the random effects (i.e., the
elements D_xx
) are on the log-Cholesky scale.
Robust standard errors using the sandwich estimator can be obtained
by setting the logical argument sandwich
to
TRUE
, i.e.,
vcov(fm, sandwich = TRUE)
#> (Intercept) sexfemale time sexfemale:time
#> (Intercept) 0.099375006 -0.070392849 -0.005229360 0.0024372130
#> sexfemale -0.070392849 0.213560962 0.003080019 -0.0183177219
#> time -0.005229360 0.003080019 0.002907846 -0.0026757783
#> sexfemale:time 0.002437213 -0.018317722 -0.002675778 0.0061654496
#> D_11 -0.080682266 -0.015269283 0.006084500 0.0034781044
#> D_12 0.017266666 -0.005005000 -0.001431995 -0.0002375471
#> D_22 -0.081773200 0.032220915 0.005444180 0.0024739539
#> D_11 D_12 D_22
#> (Intercept) -0.080682266 0.0172666661 -0.081773200
#> sexfemale -0.015269283 -0.0050049996 0.032220915
#> time 0.006084500 -0.0014319950 0.005444180
#> sexfemale:time 0.003478104 -0.0002375471 0.002473954
#> D_11 0.257244771 -0.0547470851 0.235884951
#> D_12 -0.054747085 0.0217713869 -0.122326509
#> D_22 0.235884951 -0.1223265091 0.831270814
The use of robust standard errors via the sandwich
argument is also available in the summary()
,
confint()
, anova()
,
marginal_coefs()
, effectPlotData()
,
predict()
, and simulate()
methods.
To extract the estimated fixed effects coefficients from a fitted
mixed model, we can use the fixef()
method. Similarly, the
empirical Bayes estimates of the random effects are extracted using the
ranef()
method, and finally the coef()
method
returns the subject-specific coefficients, i.e., the sum of the fixed
and random effects coefficients:
fixef(fm)
#> (Intercept) sexfemale time sexfemale:time
#> -2.054036030 -1.167231806 0.261163648 0.001917382
head(ranef(fm))
#> (Intercept) time
#> 1 -0.3667910 -0.123026294
#> 2 0.6656132 0.289323835
#> 3 0.4836942 0.215045666
#> 4 -0.1904426 -0.008732796
#> 5 0.2477730 0.160118862
#> 6 -0.3739774 -0.183618169
head(coef(fm))
#> (Intercept) sexfemale time sexfemale:time
#> 1 -2.420827 -1.167232 0.13813735 0.001917382
#> 2 -1.388423 -1.167232 0.55048748 0.001917382
#> 3 -1.570342 -1.167232 0.47620931 0.001917382
#> 4 -2.244479 -1.167232 0.25243085 0.001917382
#> 5 -1.806263 -1.167232 0.42128251 0.001917382
#> 6 -2.428013 -1.167232 0.07754548 0.001917382
The fixed effects estimates in mixed models with nonlinear link
functions have an interpretation conditional on the random effects.
However, often we wish to obtain parameters with a marginal / population
averaged interpretation, which leads many researchers to use generalized
estimating equations, and dealing with potential issues with missing
data. Nonetheless, recently Hedeker et al. have
proposed a nice solution to this problem. Their approach is implemented
in function marginal_coefs()
. For example, for model
fm
we obtain the marginalized coefficients using:
marginal_coefs(fm)
#> (Intercept) sexfemale time sexfemale:time
#> -1.6023 -1.0968 0.1766 0.0508
The function calculates the marginal log odds ratios in our case
(because we have a binary outcome) using a Monte Carlo procedure with
number of samples determined by the M
argument.
Standard errors for the marginalized coefficients are obtained by
setting std_errors = TRUE
in the call to
marginal_coefs()
, and require a double Monte Carlo
procedure for which argument K
comes also into play. To
speed up computations, the outer Monte Carlo procedure is performed in
parallel using package parallel and number of cores
specified in the cores
argument (results not shown):
marginal_coefs(fm, std_errors = TRUE, cores = 5)
The fitted()
method extracts fitted values from the
fitted mixed model. These are always on the scale of the response
variable. The type
argument of fitted()
specifies the type of fitted values computed. The default is
type = "mean_subject"
which corresponds to the fitted
values calculated using only the fixed-effects part of the linear
predictor; hence, for the subject who has random effects values equal to
0, i.e., the “mean subject”:
Setting type = "subject_specific"
will calculate the
fitted values using both the fixed and random effects parts, where for
the latter the empirical Bayes estimates of the random effects are
used:
head(fitted(fm, type = "subject_specific"))
#> 1 2 3 4 5 6
#> 0.08159826 0.08308495 0.10108500 0.23896304 0.24390557 0.24431851
Finally, setting type = "marginal"
will calculate the
fitted values based on the multiplication of the fixed-effects design
matrix with the marginalized coefficients described above (due to the
required computing time, these fitted values are not displayed):
The residuals()
method simply calculates the residuals
by subtracting the fitted values from the observed repeated measurements
outcome. Hence, this method also has a type
argument with
exactly the same options as the fitted()
method.
To display the estimated longitudinal evolution of the binary outcome we can use an effect plot. This is simply predictions from the models with the corresponding 95% pointwise confidence intervals.
As a first step we create a data frame the provides the setting based
on which the plot is to be produced; function expand.grid()
is helpful in this regard:
Next we use the effectPlotData()
function that does the
heavy lifting, i.e., calculates the predictions and confidence intervals
from a fitted mixed model for the data frame provided above, i.e.,
plot_data <- effectPlotData(fm, nDF)
Then we can produce the plot using for example the
xyplot()
function from package lattice,
e.g.,
library("lattice")
xyplot(pred + low + upp ~ time | sex, data = plot_data,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "log odds")
expit <- function (x) exp(x) / (1 + exp(x))
xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data,
type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
xlab = "Follow-up time", ylab = "Subject-Specific Probabilities")
The effectPlotData()
function also allows to compute
marginal predictions using the marginalized coefficients described
above. This is achieved by setting marginal = TRUE
in the
respective call (results not shown):
plot_data_m <- effectPlotData(fm, nDF, marginal = TRUE, cores = 2)
# we put the two groups in the same panel
my.panel.bands <- function(x, y, upper, lower, fill, col, subscripts, ..., font,
fontface) {
upper <- upper[subscripts]
lower <- lower[subscripts]
panel.polygon(c(x, rev(x)), c(upper, rev(lower)), col = fill, border = FALSE, ...)
}
xyplot(expit(pred) ~ time, group = sex, data = plot_data_m,
upper = expit(plot_data_m$upp), low = expit(plot_data_m$low),
type = "l", col = c("blue", "red"),
fill = c("#0000FF80", "#FF000080"),
panel = function (x, y, ...) {
panel.superpose(x, y, panel.groups = my.panel.bands, ...)
panel.xyplot(x, y, lwd = 2, ...)
}, xlab = "Follow-up time", ylab = "Marginal Probabilities")
In addition to using the effectPlotData()
function, the
same type of effect plots can be produced by the effects
package. For example, based on the fm
model we produce the
effect plot for time
and for the two groups using the call
to predictorEffect()
and its plot()
method:
library("effects")
plot(predictorEffect("time", fm), type = "link")
The type
argument controls the scale of the y-axis,
namely, type = "link"
corresponds to the log-odds scale. If
we would like to obtain the figure in the probability scale, we should
set type = "response"
, i.e.,
plot(predictorEffect("time", fm), type = "response")
For the additional functionality provided by the effects
package, check the vignette:
vignette("predictor-effects-gallery", package = "effects")
.
Note: Effects plots via the effects
package are currently only supported for the binomial()
and
poisson()
families.
The anova()
method can be used to compare two fitted
mixed models using a likelihood ratio test. For example, we test if we
can test the null hypothesis that the covariance between the random
intercepts and slopes is equal to zero using:
gm <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
family = binomial())
anova(gm, fm)
#>
#> AIC BIC log.Lik LRT df p.value
#> gm 730.94 746.57 -359.47
#> fm 731.62 749.86 -358.81 1.32 1 0.2499
Using the predict()
method we can calculate predictions
for new subjects. As an example, we treat subject 1 from the
DF
dataset as a new patient (in the code below we change
his id
variable):
pred_DF <- DF[DF$id == 1, ][1:4, ]
pred_DF$id <- paste0("N", as.character(pred_DF$id))
pred_DF
#> id time sex y
#> 1 N1 0.0000000 male 0
#> 2 N1 0.1424363 male 0
#> 3 N1 1.7055512 male 0
#> 4 N1 9.1391210 male 0
We start by computing predictions based only on the fixed-effects part of the model; because of the nonlinear link function, these predictions are of subjects with random effects value equal to zero, which is not to the average predictions:
predict(fm, newdata = pred_DF, type_pred = "response",
type = "mean_subject", se.fit = TRUE)
#> $pred
#> 1 2 3 4
#> 0.1136452 0.1174465 0.1667820 0.5824332
#>
#> $se.fit
#> 1 2 3 4
#> 0.3134876 0.3098351 0.2818441 0.4604009
Population averaged predictions can be obtained by first calculating
the marginalized coefficients (using marginal_coefs()
) and
multiplying them with the fixed effects design matrix; this is achieved
using the option type = "marginal"
(due to the required
computing time, predictions not shown):
predict(fm, newdata = pred_DF, type_pred = "response",
type = "marginal", se.fit = FALSE)
Finally, we calculate subject-specific predictions; the standard errors are calculated with a Monte Carlo scheme (for details check the online help file):
predict(fm, newdata = pred_DF, type_pred = "response",
type = "subject_specific", se.fit = TRUE)
#> $pred
#> 1 2 3 4
#> 0.07937672 0.08042530 0.09278823 0.17716458
#>
#> $se.fit
#> 1 2 3 4
#> 0.02788083 0.02806700 0.03210400 0.11341043
#>
#> $low
#> 1 2 3 4
#> 0.04116652 0.04187969 0.04874345 0.04567313
#>
#> $upp
#> 1 2 3 4
#> 0.1509647 0.1524805 0.1693027 0.4959877
#>
#> $success_rate
#> [1] 0.5
Suppose now that we want predictions at time points at which no
responses y
have been recorded, e.g.,
future_Times <- pred_DF[1:3, c("id", "time", "sex")]
future_Times$time <- c(3, 4, 10)
future_Times
#> id time sex
#> 1 N1 3 male
#> 2 N1 4 male
#> 3 N1 10 male
Predictions at these time points can be calculated by provide this
data frame in the newdata2
argument of
predict()
:
predict(fm, newdata = pred_DF, newdata2 = future_Times, type_pred = "response",
type = "subject_specific", se.fit = TRUE)
#> $pred
#> 1 2 3 4
#> 0.07937672 0.08042530 0.09278823 0.17716458
#>
#> $pred2
#> 1 2 3
#> 0.1042907 0.1140225 0.1900830
#>
#> $se.fit
#> 1 2 3 4
#> 0.02788083 0.02806700 0.03210400 0.11341043
#>
#> $se.fit2
#> 1 2 3
#> 0.03874363 0.04616593 0.12788755
#>
#> $low
#> 1 2 3 4
#> 0.04116652 0.04187969 0.04874345 0.04567313
#>
#> $upp
#> 1 2 3 4
#> 0.1509647 0.1524805 0.1693027 0.4959877
#>
#> $low2
#> 1 2 3
#> 0.05089954 0.05126488 0.04379313
#>
#> $upp2
#> 1 2 3
#> 0.1952552 0.2214955 0.5575331
The simulate()
method can be used to simulate response
outcome data from a fitted mixed model. For example, we simulate two
realization of our dichotomous outcome:
head(simulate(fm, nsim = 2, seed = 123), 10)
#> [,1] [,2]
#> [1,] 0 0
#> [2,] 0 0
#> [3,] 0 0
#> [4,] 1 0
#> [5,] 1 0
#> [6,] 0 1
#> [7,] 0 0
#> [8,] 1 0
#> [9,] 0 0
#> [10,] 1 0
By setting acount_MLEs_var = TRUE
in the call to
simulate()
we also account for the variability in the
maximum likelihood estimates in the simulation of new responses. This is
achieved by simulating each time a new response vector using a
realization for the parameters from a multivariate normal distribution
with mean the MLEs and covariance matrix the covariance matrix of the
MLEs: