MixMod Objects

In this vignette we illustrate the use of a number of basic generic functions for models fitted by the mixed_model() function of package GLMMadaptive.

We start by simulating some data for a binary longitudinal outcome:

set.seed(1234)
n <- 100 # number of subjects
K <- 8 # number of measurements per subject
t_max <- 15 # maximum follow-up time

# we construct a data frame with the design: 
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
                 time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
                 sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))

# design matrices for the fixed and random effects
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ time, data = DF)

betas <- c(-2.13, -0.25, 0.24, -0.05) # fixed effects coefficients
D11 <- 0.48 # variance of random intercepts
D22 <- 0.1 # variance of random slopes

# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, ]))
# we simulate binary longitudinal data
DF$y <- rbinom(n * K, 1, plogis(eta_y))

We continue by fitting the mixed effects logistic regression for y assuming random intercepts and random slopes for the random-effects part.

fm <- mixed_model(fixed = y ~ sex * time, random = ~ time | id, data = DF,
                  family = binomial())

Detailed Output, Confidence Intervals, Covariance Matrix of the MLEs and Robust Standard Errors

As in the majority of model-fitting functions in R, the print() and summary() methods display a short and a detailed output of the fitted model, respectively. For 'MixMod' objects we obtain

fm
#> 
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF, 
#>     family = binomial())
#> 
#> 
#> Model:
#>  family: binomial
#>  link: logit 
#> 
#> Random effects covariance matrix:
#>              StdDev    Corr
#> (Intercept)  0.6707        
#> time         0.2504  0.6773
#> 
#> Fixed effects:
#>    (Intercept)      sexfemale           time sexfemale:time 
#>   -2.054036030   -1.167231806    0.261163648    0.001917382 
#> 
#> log-Lik: -358.81

and

summary(fm)
#> 
#> Call:
#> mixed_model(fixed = y ~ sex * time, random = ~time | id, data = DF, 
#>     family = binomial())
#> 
#> Data Descriptives:
#> Number of Observations: 800
#> Number of Groups: 100 
#> 
#> Model:
#>  family: binomial
#>  link: logit 
#> 
#> Fit statistics:
#>  log.Lik    AIC      BIC
#>  -358.81 731.62 749.8562
#> 
#> Random effects covariance matrix:
#>              StdDev    Corr
#> (Intercept)  0.6707        
#> time         0.2504  0.6773
#> 
#> Fixed effects:
#>                Estimate Std.Err z-value  p-value
#> (Intercept)     -2.0540  0.3135 -6.5522  < 1e-04
#> sexfemale       -1.1672  0.4673 -2.4976 0.012505
#> time             0.2612  0.0562  4.6463  < 1e-04
#> sexfemale:time   0.0019  0.0782  0.0245 0.980437
#> 
#> Integration:
#> method: adaptive Gauss-Hermite quadrature rule
#> quadrature points: 11
#> 
#> Optimization:
#> method: hybrid EM and quasi-Newton
#> converged: TRUE

The output is rather self-explanatory. However, just note that the fixed-effects coefficients are on the linear predictor scale, and hence are the corresponding log-odds for the intercept and log-odds ratios for the rest of the parameters. The summary() only shows the estimated coefficients, standard errors and p-values, but no confidence intervals. These can be separately obtained using the confint() method, i.e.,

exp(confint(fm))
#>                     2.5 %  Estimate    97.5 %
#> (Intercept)    0.06935892 0.1282164 0.2370198
#> sexfemale      0.12452870 0.3112273 0.7778321
#> time           1.16299270 1.2984401 1.4496624
#> sexfemale:time 0.85955354 1.0019192 1.1678646

By default the confidence intervals are produced for the fixed effects. Hence, taking the exp we obtain the confidence intervals for the corresponding odds-ratios. In addition, by default, the level of the confidence intervals is 95%. The following piece of code produces 90% confidence intervals for the variances of the random intercepts and slopes, and for their covariance:

confint(fm, parm = "var-cov", level = 0.90)
#>                         5 %   Estimate      95 %
#> var.(Intercept)  0.05214979 0.44984116 3.8803049
#> cov.(Int)_time  -0.03350401 0.11374228 0.9571249
#> var.time         0.02225523 0.06269547 1.8129238

The estimated variance-covariance matrix of the maximum likelihood estimates of all parameters is returned using the vcov() method, e.g.,

vcov(fm)
#>                 (Intercept)    sexfemale         time sexfemale:time
#> (Intercept)     0.098274483 -0.074499996 -0.008216948   0.0061209157
#> sexfemale      -0.074499996  0.218415104  0.005918094  -0.0175394670
#> time           -0.008216948  0.005918094  0.003159419  -0.0029625166
#> sexfemale:time  0.006120916 -0.017539467 -0.002962517   0.0061144712
#> D_11           -0.087119578 -0.019450999  0.008223998   0.0018335034
#> D_12            0.016309404  0.003815306 -0.001720134  -0.0004513681
#> D_22           -0.086163936 -0.019241185  0.010700522   0.0001648898
#>                        D_11          D_12          D_22
#> (Intercept)    -0.087119578  0.0163094042 -0.0861639364
#> sexfemale      -0.019450999  0.0038153060 -0.0192411853
#> time            0.008223998 -0.0017201337  0.0107005218
#> sexfemale:time  0.001833503 -0.0004513681  0.0001648898
#> D_11            0.429031451 -0.0995215920  0.5054925071
#> D_12           -0.099521592  0.0369781665 -0.2113346392
#> D_22            0.505492507 -0.2113346392  1.3616186478

The elements of this covariance matrix that correspond to the elements of the covariance matrix of the random effects (i.e., the elements D_xx) are on the log-Cholesky scale.

Robust standard errors using the sandwich estimator can be obtained by setting the logical argument sandwich to TRUE, i.e.,

vcov(fm, sandwich = TRUE)
#>                 (Intercept)    sexfemale         time sexfemale:time
#> (Intercept)     0.099375006 -0.070392849 -0.005229360   0.0024372130
#> sexfemale      -0.070392849  0.213560962  0.003080019  -0.0183177219
#> time           -0.005229360  0.003080019  0.002907846  -0.0026757783
#> sexfemale:time  0.002437213 -0.018317722 -0.002675778   0.0061654496
#> D_11           -0.080682266 -0.015269283  0.006084500   0.0034781044
#> D_12            0.017266666 -0.005005000 -0.001431995  -0.0002375471
#> D_22           -0.081773200  0.032220915  0.005444180   0.0024739539
#>                        D_11          D_12         D_22
#> (Intercept)    -0.080682266  0.0172666661 -0.081773200
#> sexfemale      -0.015269283 -0.0050049996  0.032220915
#> time            0.006084500 -0.0014319950  0.005444180
#> sexfemale:time  0.003478104 -0.0002375471  0.002473954
#> D_11            0.257244771 -0.0547470851  0.235884951
#> D_12           -0.054747085  0.0217713869 -0.122326509
#> D_22            0.235884951 -0.1223265091  0.831270814

The use of robust standard errors via the sandwich argument is also available in the summary(), confint(), anova(), marginal_coefs(), effectPlotData(), predict(), and simulate() methods.

Fixed and Random Effects Estimates

To extract the estimated fixed effects coefficients from a fitted mixed model, we can use the fixef() method. Similarly, the empirical Bayes estimates of the random effects are extracted using the ranef() method, and finally the coef() method returns the subject-specific coefficients, i.e., the sum of the fixed and random effects coefficients:

fixef(fm)
#>    (Intercept)      sexfemale           time sexfemale:time 
#>   -2.054036030   -1.167231806    0.261163648    0.001917382
head(ranef(fm))
#>   (Intercept)         time
#> 1  -0.3667910 -0.123026294
#> 2   0.6656132  0.289323835
#> 3   0.4836942  0.215045666
#> 4  -0.1904426 -0.008732796
#> 5   0.2477730  0.160118862
#> 6  -0.3739774 -0.183618169
head(coef(fm))
#>   (Intercept) sexfemale       time sexfemale:time
#> 1   -2.420827 -1.167232 0.13813735    0.001917382
#> 2   -1.388423 -1.167232 0.55048748    0.001917382
#> 3   -1.570342 -1.167232 0.47620931    0.001917382
#> 4   -2.244479 -1.167232 0.25243085    0.001917382
#> 5   -1.806263 -1.167232 0.42128251    0.001917382
#> 6   -2.428013 -1.167232 0.07754548    0.001917382

Marginalized Coefficients

The fixed effects estimates in mixed models with nonlinear link functions have an interpretation conditional on the random effects. However, often we wish to obtain parameters with a marginal / population averaged interpretation, which leads many researchers to use generalized estimating equations, and dealing with potential issues with missing data. Nonetheless, recently Hedeker et al. have proposed a nice solution to this problem. Their approach is implemented in function marginal_coefs(). For example, for model fm we obtain the marginalized coefficients using:

marginal_coefs(fm)
#>    (Intercept)      sexfemale           time sexfemale:time 
#>        -1.6023        -1.0968         0.1766         0.0508

The function calculates the marginal log odds ratios in our case (because we have a binary outcome) using a Monte Carlo procedure with number of samples determined by the M argument.

Standard errors for the marginalized coefficients are obtained by setting std_errors = TRUE in the call to marginal_coefs(), and require a double Monte Carlo procedure for which argument K comes also into play. To speed up computations, the outer Monte Carlo procedure is performed in parallel using package parallel and number of cores specified in the cores argument (results not shown):

marginal_coefs(fm, std_errors = TRUE, cores = 5)

Fitted Values and Residuals

The fitted() method extracts fitted values from the fitted mixed model. These are always on the scale of the response variable. The type argument of fitted() specifies the type of fitted values computed. The default is type = "mean_subject" which corresponds to the fitted values calculated using only the fixed-effects part of the linear predictor; hence, for the subject who has random effects values equal to 0, i.e., the “mean subject”:

head(fitted(fm))
#>         1         2         3         4         5         6 
#> 0.1136452 0.1174465 0.1667820 0.5824332 0.5947878 0.5958071

Setting type = "subject_specific" will calculate the fitted values using both the fixed and random effects parts, where for the latter the empirical Bayes estimates of the random effects are used:

head(fitted(fm, type = "subject_specific"))
#>          1          2          3          4          5          6 
#> 0.08159826 0.08308495 0.10108500 0.23896304 0.24390557 0.24431851

Finally, setting type = "marginal" will calculate the fitted values based on the multiplication of the fixed-effects design matrix with the marginalized coefficients described above (due to the required computing time, these fitted values are not displayed):

head(fitted(fm, type = "marginal"))

The residuals() method simply calculates the residuals by subtracting the fitted values from the observed repeated measurements outcome. Hence, this method also has a type argument with exactly the same options as the fitted() method.

Effect Plots

To display the estimated longitudinal evolution of the binary outcome we can use an effect plot. This is simply predictions from the models with the corresponding 95% pointwise confidence intervals.

As a first step we create a data frame the provides the setting based on which the plot is to be produced; function expand.grid() is helpful in this regard:

nDF <- with(DF, expand.grid(time = seq(min(time), max(time), length.out = 15),
                            sex = levels(sex)))

Next we use the effectPlotData() function that does the heavy lifting, i.e., calculates the predictions and confidence intervals from a fitted mixed model for the data frame provided above, i.e.,

plot_data <- effectPlotData(fm, nDF)

Then we can produce the plot using for example the xyplot() function from package lattice, e.g.,

library("lattice")
xyplot(pred + low + upp ~ time | sex, data = plot_data,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "log odds")

expit <- function (x) exp(x) / (1 + exp(x))
xyplot(expit(pred) + expit(low) + expit(upp) ~ time | sex, data = plot_data,
       type = "l", lty = c(1, 2, 2), col = c(2, 1, 1), lwd = 2,
       xlab = "Follow-up time", ylab = "Subject-Specific Probabilities")

The effectPlotData() function also allows to compute marginal predictions using the marginalized coefficients described above. This is achieved by setting marginal = TRUE in the respective call (results not shown):

plot_data_m <- effectPlotData(fm, nDF, marginal = TRUE, cores = 2)

# we put the two groups in the same panel
my.panel.bands <- function(x, y, upper, lower, fill, col, subscripts, ..., font, 
                           fontface) {
    upper <- upper[subscripts]
    lower <- lower[subscripts]
    panel.polygon(c(x, rev(x)), c(upper, rev(lower)), col = fill, border = FALSE, ...)
}

xyplot(expit(pred) ~ time, group = sex, data = plot_data_m, 
       upper = expit(plot_data_m$upp), low = expit(plot_data_m$low), 
       type = "l", col = c("blue", "red"), 
       fill = c("#0000FF80", "#FF000080"),
       panel = function (x, y, ...) {
           panel.superpose(x, y, panel.groups = my.panel.bands, ...)
           panel.xyplot(x, y, lwd = 2,  ...)
}, xlab = "Follow-up time", ylab = "Marginal Probabilities")

Effect Plots using the effects package

In addition to using the effectPlotData() function, the same type of effect plots can be produced by the effects package. For example, based on the fm model we produce the effect plot for time and for the two groups using the call to predictorEffect() and its plot() method:

library("effects")
plot(predictorEffect("time", fm), type = "link")

The type argument controls the scale of the y-axis, namely, type = "link" corresponds to the log-odds scale. If we would like to obtain the figure in the probability scale, we should set type = "response", i.e.,

plot(predictorEffect("time", fm), type = "response")

For the additional functionality provided by the effects package, check the vignette: vignette("predictor-effects-gallery", package = "effects").

Note: Effects plots via the effects package are currently only supported for the binomial() and poisson() families.

Comparing Two Models

The anova() method can be used to compare two fitted mixed models using a likelihood ratio test. For example, we test if we can test the null hypothesis that the covariance between the random intercepts and slopes is equal to zero using:

gm <- mixed_model(fixed = y ~ sex * time, random = ~ time || id, data = DF,
                  family = binomial())

anova(gm, fm)
#> 
#>       AIC    BIC log.Lik  LRT df p.value
#> gm 730.94 746.57 -359.47                
#> fm 731.62 749.86 -358.81 1.32  1  0.2499

Predictions

Using the predict() method we can calculate predictions for new subjects. As an example, we treat subject 1 from the DF dataset as a new patient (in the code below we change his id variable):

pred_DF <- DF[DF$id == 1, ][1:4, ]
pred_DF$id <- paste0("N", as.character(pred_DF$id))
pred_DF
#>   id      time  sex y
#> 1 N1 0.0000000 male 0
#> 2 N1 0.1424363 male 0
#> 3 N1 1.7055512 male 0
#> 4 N1 9.1391210 male 0

We start by computing predictions based only on the fixed-effects part of the model; because of the nonlinear link function, these predictions are of subjects with random effects value equal to zero, which is not to the average predictions:

predict(fm, newdata = pred_DF, type_pred = "response",
        type = "mean_subject", se.fit = TRUE)
#> $pred
#>         1         2         3         4 
#> 0.1136452 0.1174465 0.1667820 0.5824332 
#> 
#> $se.fit
#>         1         2         3         4 
#> 0.3134876 0.3098351 0.2818441 0.4604009

Population averaged predictions can be obtained by first calculating the marginalized coefficients (using marginal_coefs()) and multiplying them with the fixed effects design matrix; this is achieved using the option type = "marginal" (due to the required computing time, predictions not shown):

predict(fm, newdata = pred_DF, type_pred = "response",
        type = "marginal", se.fit = FALSE)

Finally, we calculate subject-specific predictions; the standard errors are calculated with a Monte Carlo scheme (for details check the online help file):

predict(fm, newdata = pred_DF, type_pred = "response",
        type = "subject_specific", se.fit = TRUE)
#> $pred
#>          1          2          3          4 
#> 0.07937672 0.08042530 0.09278823 0.17716458 
#> 
#> $se.fit
#>          1          2          3          4 
#> 0.02788083 0.02806700 0.03210400 0.11341043 
#> 
#> $low
#>          1          2          3          4 
#> 0.04116652 0.04187969 0.04874345 0.04567313 
#> 
#> $upp
#>         1         2         3         4 
#> 0.1509647 0.1524805 0.1693027 0.4959877 
#> 
#> $success_rate
#> [1] 0.5

Suppose now that we want predictions at time points at which no responses y have been recorded, e.g.,

future_Times <- pred_DF[1:3, c("id", "time", "sex")]
future_Times$time <- c(3, 4, 10)
future_Times
#>   id time  sex
#> 1 N1    3 male
#> 2 N1    4 male
#> 3 N1   10 male

Predictions at these time points can be calculated by provide this data frame in the newdata2 argument of predict():

predict(fm, newdata = pred_DF, newdata2 = future_Times, type_pred = "response",
        type = "subject_specific", se.fit = TRUE)
#> $pred
#>          1          2          3          4 
#> 0.07937672 0.08042530 0.09278823 0.17716458 
#> 
#> $pred2
#>         1         2         3 
#> 0.1042907 0.1140225 0.1900830 
#> 
#> $se.fit
#>          1          2          3          4 
#> 0.02788083 0.02806700 0.03210400 0.11341043 
#> 
#> $se.fit2
#>          1          2          3 
#> 0.03874363 0.04616593 0.12788755 
#> 
#> $low
#>          1          2          3          4 
#> 0.04116652 0.04187969 0.04874345 0.04567313 
#> 
#> $upp
#>         1         2         3         4 
#> 0.1509647 0.1524805 0.1693027 0.4959877 
#> 
#> $low2
#>          1          2          3 
#> 0.05089954 0.05126488 0.04379313 
#> 
#> $upp2
#>         1         2         3 
#> 0.1952552 0.2214955 0.5575331

Simulate

The simulate() method can be used to simulate response outcome data from a fitted mixed model. For example, we simulate two realization of our dichotomous outcome:

head(simulate(fm, nsim = 2, seed = 123), 10)
#>       [,1] [,2]
#>  [1,]    0    0
#>  [2,]    0    0
#>  [3,]    0    0
#>  [4,]    1    0
#>  [5,]    1    0
#>  [6,]    0    1
#>  [7,]    0    0
#>  [8,]    1    0
#>  [9,]    0    0
#> [10,]    1    0

By setting acount_MLEs_var = TRUE in the call to simulate() we also account for the variability in the maximum likelihood estimates in the simulation of new responses. This is achieved by simulating each time a new response vector using a realization for the parameters from a multivariate normal distribution with mean the MLEs and covariance matrix the covariance matrix of the MLEs:

head(simulate(fm, nsim = 2, seed = 123, acount_MLEs_var = TRUE), 10)
#>       [,1] [,2]
#>  [1,]    0    0
#>  [2,]    0    0
#>  [3,]    1    0
#>  [4,]    1    1
#>  [5,]    1    0
#>  [6,]    1    1
#>  [7,]    0    0
#>  [8,]    0    1
#>  [9,]    1    0
#> [10,]    1    0