Fitting Joint Models with JMbayes2

Univariate

The function that fits joint models in JMbayes2 is called jm(). It has three required arguments, Surv_object a Cox model fitted by coxph() or an Accelerated Failure time model fitted by survreg(), Mixed_objects a single or a list of mixed models fitted either by the lme() or mixed_model() functions, and time_var a character string indicating the name of the time variable in the specification of the mixed-effects models. We will illustrate the basic use of the package in the PBC dataset. We start by fitting a Cox model for the composite event transplantation or death, including sex as a baseline covariate:

pbc2.id$status2 <- as.numeric(pbc2.id$status != 'alive')
CoxFit <- coxph(Surv(years, status2) ~ sex, data = pbc2.id)

Our aim is to assess the strength of the association between the risk of the composite event and the levels of serum bilirubin that has been collected during follow-up. We will describe the patient-specific profiles over time for this biomarker using a linear mixed model, with fixed-effects, time, sex, and their interaction, and as random effects random intercepts and random slopes. The syntax to fit this model with lme() is:

fm1 <- lme(log(serBilir) ~ year * sex, data = pbc2, random = ~ year | id)

The joint model that links the survival and longitudinal submodels is fitted with the following call to the jm() function:

jointFit1 <- jm(CoxFit, fm1, time_var = "year")
summary(jointFit1)
#> 
#> Call:
#> jm(Surv_object = CoxFit, Mixed_objects = fm1, time_var = "year")
#> 
#> Data Descriptives:
#> Number of Groups: 312        Number of events: 169 (54.2%)
#> Number of Observations:
#>   log(serBilir): 1945
#> 
#>                  DIC     WAIC      LPML
#> marginal    4773.096 10180.63 -5224.691
#> conditional 5071.195 12233.85 -6366.885
#> 
#> Random-effects covariance matrix:
#>                     
#>        StdDev   Corr
#> (Intr) 1.0098 (Intr)
#> year   0.1811 0.3640
#> 
#> Survival Outcome:
#>                         Mean  StDev    2.5%  97.5%    P   Rhat
#> sexfemale            -0.1011 0.4616 -1.0395 0.8064 0.82 1.0061
#> value(log(serBilir))  1.2117 0.1124  0.9927 1.4355 0.00 1.0117
#> 
#> Longitudinal Outcome: log(serBilir) (family = gaussian, link = identity)
#>                   Mean  StDev    2.5%  97.5%      P   Rhat
#> (Intercept)     0.7262 0.3134  0.1325 1.3570 0.0176 1.0012
#> year            0.2618 0.1014  0.0632 0.4621 0.0162 1.0033
#> sexfemale      -0.2643 0.3288 -0.9138 0.3648 0.4162 1.0015
#> year:sexfemale -0.0866 0.1056 -0.2939 0.1240 0.3949 1.0034
#> sigma           0.4043 0.0233  0.3673 0.4560 0.0000 1.0068
#> 
#> MCMC summary:
#> chains: 3 
#> iterations per chain: 3500 
#> burn-in per chain: 500 
#> thinning: 1 
#> time: 16 sec

The output of the summary() method provides some descriptive statistics of the sample at hand, followed by some fit statistics based on the marginal (random effects are integrated out using the Laplace approximation) and conditional on the random effects log-likelihood functions, followed by the estimated variance-covariance matrix for the random effects, followed by the estimates for the survival submodel, followed by the estimates for the longitudinal submodel(s), and finally some information for the MCMC fitting algorithm.

By default, jm() adds the subject-specific linear predictor of the mixed model as a time-varying covariate in the survival relative risk model. In the output this is named as value(log(serBilir)) to denote that, by default, the current value functional form is used. That is, we assume that the instantaneous risk of an event at a specific time \(t\) is associated with the value of the linear predictor of the longitudinal outcome at the same time point \(t\).

Standard MCMC diagnostics are available to evaluate convergence. For example, the traceplot for the association coefficient value(log(serBilir)) is produced with the following syntax:

ggtraceplot(jointFit1, "alphas")

and the density plot with the call:

ggdensityplot(jointFit1, "alphas")

Multivariate

To fit a joint model with multiple longitudinal outcomes, we simply provide a list of mixed models as the second argument of jm(). In the following example, we extend the joint model we fitted above by also including the prothrombin time and the log odds of the presence or not of ascites as time-varying covariates in the relative risk model for the composite event. Because this model is more complex, we increase the number of MCMC iterations, the number of burn-in iterations and the thinning per chain, using the corresponding control arguments:

fm2 <- lme(prothrombin ~ year * sex, data = pbc2, random = ~ year | id)
fm3 <- mixed_model(ascites ~ year + sex, data = pbc2,
                   random = ~ year | id, family = binomial())

jointFit2 <- jm(CoxFit, list(fm1, fm2, fm3), time_var = "year",
                n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)
summary(jointFit2)
#> 
#> Call:
#> jm(Surv_object = CoxFit, Mixed_objects = list(fm1, fm2, fm3), 
#>     time_var = "year", n_iter = 12000L, n_burnin = 2000L, n_thin = 5L)
#> 
#> Data Descriptives:
#> Number of Groups: 312        Number of events: 169 (54.2%)
#> Number of Observations:
#>   log(serBilir): 1945
#>   prothrombin: 1945
#>   ascites: 1885
#> 
#>                  DIC     WAIC       LPML
#> marginal    12056.71 17219.82  -9938.031
#> conditional 14808.13 18731.64 -11111.223
#> 
#> Random-effects covariance matrix:
#>                                                  
#>        StdDev   Corr                             
#> (Intr) 0.9939 (Intr)   year (Intr)   year (Intr) 
#> year   0.1889 0.4073                             
#> (Intr) 0.8161 0.5326 0.4421                      
#> year   0.3320 0.4024 0.3558 0.0082               
#> (Intr) 3.7032 0.5222 0.4810 0.5353 0.2723        
#> year   0.6266 0.4220 0.6258 0.2384 0.4321 -0.0430
#> 
#> Survival Outcome:
#>                         Mean  StDev    2.5%  97.5%      P   Rhat
#> sexfemale            -0.6074 0.5441 -1.7158 0.4091 0.2593 1.0012
#> value(log(serBilir))  0.6454 0.2001  0.2347 1.0252 0.0080 1.0235
#> value(prothrombin)    0.0620 0.1129 -0.1792 0.2650 0.5347 1.0555
#> value(ascites)        0.3852 0.1114  0.2021 0.6430 0.0000 1.0874
#> 
#> Longitudinal Outcome: log(serBilir) (family = gaussian, link = identity)
#>                   Mean  StDev    2.5%  97.5%      P   Rhat
#> (Intercept)     0.7030 0.2838  0.1469 1.2595 0.0160 0.9996
#> year            0.2704 0.0783  0.1168 0.4274 0.0013 1.0009
#> sexfemale      -0.2428 0.2963 -0.8241 0.3334 0.4017 0.9996
#> year:sexfemale -0.0791 0.0806 -0.2434 0.0778 0.3233 1.0006
#> sigma           0.3921 0.0185  0.3625 0.4339 0.0000 1.0035
#> 
#> Longitudinal Outcome: prothrombin (family = gaussian, link = identity)
#>                   Mean  StDev    2.5%   97.5%      P   Rhat
#> (Intercept)    10.9390 0.4810 10.0012 11.8931 0.0000 1.0003
#> year            0.2503 0.1657 -0.0831  0.5835 0.1223 0.9997
#> sexfemale      -0.3973 0.4995 -1.3729  0.5880 0.4190 1.0001
#> year:sexfemale  0.0497 0.1710 -0.2930  0.3839 0.7590 1.0003
#> sigma           1.0855 0.0282  1.0362  1.1476 0.0000 1.0015
#> 
#> Longitudinal Outcome: ascites (family = binomial, link = logit)
#>                Mean  StDev    2.5%   97.5%     P   Rhat
#> (Intercept) -5.4997 1.3582 -8.4181 -3.0226 0.000 1.0009
#> year         0.7626 0.2051  0.3770  1.1913 0.000 1.0123
#> sexfemale   -0.3870 1.1630 -2.6444  1.9395 0.725 1.0025
#> 
#> MCMC summary:
#> chains: 3 
#> iterations per chain: 12000 
#> burn-in per chain: 2000 
#> thinning: 5 
#> time: 1.8 min

The output for the survival submodel contains now the estimated coefficients for value(prothrombin) and value(ascites), and parameter estimates for all three longitudinal submodels.

Functional forms

As mentioned above, the default call to jm() includes the subject-specific linear predictors of the mixed-effects models as time-varying covariates in the relative risk model. However, this is just one of the many possibilities we have to link the longitudinal and survival outcomes. The argument functional_forms of jm() provides additional options. Based on previous experience, two extra functional forms are provided, namely, the time-varying slope and the time-varying normalized area/cumulative-effect. The time-varying slope is the first order derivative of the subject-specific linear predictor of the mixed-effect model with respect to the (follow-up) time variable. The time-varying normalized area/cumulative-effect is the integral of the subject-specific linear predictor of the mixed-effect model from zero to the current (follow-up) time \(t\) divided by \(t\). The integral is the area under the subject-specific longitudinal profile; by dividing the integral by \(t\) we obtain the average of the subject-specific longitudinal profile over the corresponding period \((0, t)\).

To illustrate how the functional_forms argument can be used to specify these functional forms, we update the joint model jointFit2 by including the time-varying slope of log serum bilirubin instead of the value, and also the interaction of this slope with sex, and for prothrombin we include the normalized cumulative effect. For ascites, we keep the value functional form. The corresponding syntax to fit this model is:

fForms <- list(
  "log(serBilir)" = ~ slope(log(serBilir)) + slope(log(serBilir)):sex,
  "prothrombin" = ~ area(prothrombin)
)

jointFit3 <- update(jointFit2, functional_forms = fForms)
summary(jointFit3)
#> 
#> Call:
#> jm(Surv_object = CoxFit, Mixed_objects = list(fm1, fm2, fm3), 
#>     time_var = "year", functional_forms = fForms, n_iter = 12000L, 
#>     n_burnin = 2000L, n_thin = 5L)
#> 
#> Data Descriptives:
#> Number of Groups: 312        Number of events: 169 (54.2%)
#> Number of Observations:
#>   log(serBilir): 1945
#>   prothrombin: 1945
#>   ascites: 1885
#> 
#>                  DIC     WAIC       LPML
#> marginal    12079.47 15343.01  -7917.041
#> conditional 14782.95 18253.99 -11171.787
#> 
#> Random-effects covariance matrix:
#>                                                  
#>        StdDev   Corr                             
#> (Intr) 0.9785 (Intr)   year (Intr)   year (Intr) 
#> year   0.1928 0.4755                             
#> (Intr) 0.8135 0.5382 0.4659                      
#> year   0.3344 0.4242 0.3622 0.0177               
#> (Intr) 3.7593 0.5308 0.4782 0.5244 0.2728        
#> year   0.7034 0.4977 0.6566 0.2806 0.4367 -0.0132
#> 
#> Survival Outcome:
#>                                   Mean  StDev    2.5%  97.5%      P   Rhat
#> sexfemale                      -1.0087 0.9399 -2.7552 0.9199 0.2837 1.0318
#> slope(log(serBilir))            2.7882 2.0711 -1.1678 6.7593 0.1817 1.0199
#> slope(log(serBilir)):sexfemale  1.0001 2.2810 -3.5238 5.3499 0.6373 1.0698
#> area(prothrombin)               0.1761 0.2310 -0.3084 0.6030 0.4340 1.0045
#> value(ascites)                  0.4916 0.1228  0.2763 0.7486 0.0000 1.0668
#> 
#> Longitudinal Outcome: log(serBilir) (family = gaussian, link = identity)
#>                   Mean  StDev    2.5%  97.5%      P   Rhat
#> (Intercept)     0.6943 0.2723  0.1509 1.2189 0.0123 0.9998
#> year            0.2753 0.0738  0.1320 0.4257 0.0000 1.0018
#> sexfemale      -0.2390 0.2850 -0.8021 0.3181 0.3927 0.9998
#> year:sexfemale -0.0736 0.0761 -0.2255 0.0749 0.3307 1.0016
#> sigma           0.3913 0.0179  0.3623 0.4316 0.0000 1.0004
#> 
#> Longitudinal Outcome: prothrombin (family = gaussian, link = identity)
#>                   Mean  StDev    2.5%   97.5%      P   Rhat
#> (Intercept)    10.9414 0.4916  9.9491 11.9116 0.0000 1.0008
#> year            0.2520 0.1655 -0.0670  0.5905 0.1173 1.0008
#> sexfemale      -0.3993 0.5204 -1.4192  0.6404 0.4357 1.0000
#> year:sexfemale  0.0524 0.1747 -0.2993  0.3996 0.7453 1.0003
#> sigma           1.0853 0.0275  1.0374  1.1456 0.0000 0.9998
#> 
#> Longitudinal Outcome: ascites (family = binomial, link = logit)
#>                Mean  StDev    2.5%   97.5%      P   Rhat
#> (Intercept) -5.6408 1.3835 -8.4997 -3.0889 0.0000 1.0322
#> year         0.8419 0.2180  0.4377  1.3112 0.0000 1.0330
#> sexfemale   -0.3756 1.1981 -2.8091  1.9679 0.7477 1.0061
#> 
#> MCMC summary:
#> chains: 3 
#> iterations per chain: 12000 
#> burn-in per chain: 2000 
#> thinning: 5 
#> time: 1.9 min

As seen above, the functional_forms argument is a named list with elements corresponding to the longitudinal outcomes. If a longitudinal outcome is not specified in this list, then the default, value functional form, is used for that outcome. Each element of the list should be a one-sided R formula in which the functions value(), slope() and area() can be used. Interaction terms between the functional forms and other (baseline) covariates are also allowed.

Penalized Coefficients using Shrinkage Priors

When multiple longitudinal outcomes are considered with possibly different functional forms per outcome, we require to fit a relative risk model containing several terms. Moreover, it is often of scientific interest to select which terms/functional-forms per longitudinal outcome are more strongly associated with the risk of the event of interest. To facilitate this selection, jm() provides the option to penalize the regression coefficients using shrinkage priors. As an example, we refit jointFit3 by assuming a Horseshoe prior for the alphas coefficients (i.e., the coefficients of the longitudinal outcomes in the relative risk model):

jointFit4 <- update(jointFit3, priors = list("penalty_alphas" = "horseshoe"))
cbind("un-penalized" = unlist(coef(jointFit3)), 
      "penalized" = unlist(coef(jointFit4)))
#>                                            un-penalized  penalized
#> gammas.Mean                                  -1.0086870 -1.0628977
#> association.slope(log(serBilir))              2.7881863  2.6240017
#> association.slope(log(serBilir)):sexfemale    1.0000792  1.1611848
#> association.area(prothrombin)                 0.1760512  0.1569630
#> association.value(ascites)                    0.4915701  0.4824727

Apart from the Horseshoe prior, the ridge prior is also provided.