vignettes/Multi_State_Processes.Rmd
Multi_State_Processes.Rmd
A subject may often transition between multiple states, and we are interested in assessing the association of longitudinal marker(s) with each of these transitions. In this vignette, we will illustrate how to achieve this using JMbayes2.
We will consider a simple case with one longitudinal outcome and a three-state (illness-death) model, but this application can be extended for the cases of multiple longitudinal markers and more than three states.
First, we will simulate data from a joint model with a single linear mixed effects model and a multi-state process with three possible states. The multi-state process can be visualized as:
where all subjects start from the state “Healthy” and then can transition to either state “Illness” and then state “Death” or directly to state “Death.” In this case, states “Healthy” and “Illness” are transient states as the subject, when occupying these states, can still transition to other states, whereas “Death” is an absorbing state as when a subject reaches this state then no further transitions can occur. This means that three transitions are possible: \(1 \rightarrow 2\), \(1 \rightarrow 3\) and \(2 \rightarrow 3\) with transition intensities \(h_{12}\left(t\right)\), \(h_{13}\left(t\right)\) and \(h_{23}\left(t\right)\) respectively.
For our example, the default functional form is assumed, i.e., that the linear predictor \(\eta(t)\) of the mixed model is associated with each transition intensity at time \(t\). The following piece of code simulates the data:
set.seed(1234)
# number of subjects
N <- 500
# number of measurements per subject
n <- 20
# vector of ids
id <- rep(1:N, each = n)
# minimum and maximum follow-up times
min.t <- 0.01
max.t <- 16
# sample time-points
time <- replicate(N, c(0, sort(runif(n - 1, min = min.t, max = max.t))), simplify = FALSE)
time <- do.call(c, time)
# sample continuous covariate values
Xcov.s <- rnorm(N, mean = 4.763, sd = 2.8) # wide version
Xcov <- rep(Xcov.s, each = n) # long
# initiate data frame to store results
DF <- data.frame("id" = id, "time" = time, "X" = Xcov)
# design matrices for fixed and random effects
X <- model.matrix(~ 1 + time + X, data = DF)
Z <- model.matrix(~ 1 + time, data = DF)
D11 <- 1.0 # variance of random intercepts
D22 <- 0.5 # variance of random slopes
# we simulate random effects
b <- cbind(rnorm(N, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# fixed effects coefficients
true.betas <- c(-0.482, 0.243, 1.52)
# linear predictor
eta.y <- as.vector(X %*% true.betas + rowSums(Z * b[id, ]))
# residual standard error
sigma.e <- 1.242
# sample longitudinal outcome values
DF$y <- rnorm(N * n, eta.y, sigma.e)
# values for the association parameter per transition
alpha <- c("alpha.01" = 0.8, "alpha.02" = 0.55, "alpha.12" = 1.25)
# shape of Weibull for each transition
phi <- c("phi.01" = 12.325, "phi.02" = 8.216, "phi.12" = 3.243)
# regression coefficients for transition intensities
gammas <- c("(Intercept)1" = -22.25, "X" = 1.2,
"(Intercept)2" = -18.25, "X" = 1.2,
"(Intercept)3" = -19.25, "X" = 1.2)
# design matrix transition intensities
W <- cbind("(Intercept)1"= rep(1, N), Xcov[seq(1, by = n, N*n)],
"(Intercept)2"= rep(1, N), Xcov[seq(1, by = n, N*n)],
"(Intercept)3"= rep(1, N), Xcov[seq(1, by = n, N*n)])
## linear predictor for transition: 0 -> 1
eta.t1 <- as.vector(W[, c(1, 2), drop = FALSE] %*% gammas[1:2])
## linear predictor for transition: 0 -> 2
eta.t2 <- as.vector(W[, c(3, 4), drop = FALSE] %*% gammas[3:4])
## linear predictor for transition: 1 -> 2
eta.t3 <- as.vector(W[, c(5, 6), drop = FALSE] %*% gammas[5:6])
# we simulate event times using inverse transform sampling
invS01 <- function(t, u, i) {
h <- function(s) {
XX <- cbind(1, s, Xcov[i])
ZZ <- cbind(1, s)
f1 <- as.vector(XX %*% true.betas + rowSums(ZZ * b[rep(i, nrow(ZZ)), ]))
exp(log(phi["phi.01"]) + (phi["phi.01"] - 1)*log(s) + eta.t1[i] + f1*alpha["alpha.01"])
}
integrate(h, lower = 0, upper = t, subdivisions = 10000L)$value + log(u)
}
invS02 <- function(t, u, i) {
h <- function(s) {
XX <- cbind(1, s, Xcov[i])
ZZ <- cbind(1, s)
f1 <- as.vector(XX %*% true.betas + rowSums(ZZ * b[rep(i, nrow(ZZ)), ]))
exp(log(phi["phi.02"]) + (phi["phi.02"] - 1)*log(s) + eta.t2[i] + f1*alpha["alpha.02"])
}
integrate(h, lower = 0, upper = t, subdivisions = 10000)$value + log(u)
}
invS12 <- function (t, u, i) {
h <- function (s) {
XX <- cbind(1, s, Xcov[i])
ZZ <- cbind(1, s)
f1 <- as.vector(XX %*% true.betas + rowSums(ZZ * b[rep(i, nrow(ZZ)), ]))
exp(log(phi["phi.12"]) + (phi["phi.12"] - 1) * log(s) +
eta.t3[i] + f1 * alpha["alpha.12"])
}
integrate(h, lower = 0, upper = t, subdivisions = 10000)$value + log(u)
}
# Probability for each transition
u01 <- runif(N, 0, 1)
u02 <- runif(N, 0, 1)
u12 <- runif(N, 0, 1)
# initiate vectors to save true event times
trueT01 <- numeric(N)
trueT02 <- numeric(N)
trueT12 <- numeric(N)
# sample censoring times
mean.Cens <- 9
Ctimes <- runif(N, 0, 2 * mean.Cens)
# simulate time-to-event data
for (i in 1:N) {
Root01 <- NULL
Root02 <- NULL
Root12 <- NULL
Up <- 50
tries <- 5
# Transition 0->1
Up <- 200
Root01 <- try(uniroot(invS01, interval = c(1e-05, Up), u = u01[i], i = i)$root, TRUE)
trueT01[i] <- if (!inherits(Root01, "try-error")) Root01 else 500
# Transition 0->2
Up <- 200
Root02 <- try(uniroot(invS02, interval = c(1e-05, Up), u = u02[i], i = i)$root, TRUE)
trueT02[i] <- if (!inherits(Root02, "try-error")) Root02 else 500
# Transition 1->2
if(as.numeric(trueT01[i]) < as.numeric(trueT02[i]) && as.numeric(trueT01[i]) < Ctimes[i]) {
Up <- Up + 200
Root12 <- try(uniroot(invS12, interval = c(as.numeric(trueT01[i]), Up), u = u12[i], i = i)$root, TRUE)
} else {Root12 <- 500}
trueT12[i] <- if (!inherits(Root12, "try-error")) Root12 else 500
}
# initiate multi-state dataset in wide format
data_mstate <- data.frame('id' = 1:N, 'trueT01' = trueT01, 'trueT02' = trueT02, 'trueT12' = trueT12,
'Ctimes' = Ctimes, 'X' = Xcov.s)
# split by id
data_mstate_split.by.id <- split(data_mstate, data_mstate$id)
# function to pass to lapply to prepare multi-state data per id
ms_arrange <- function (x) {
if (x$Ctimes < min(x$trueT01, x$trueT02)) {
x_new <- data.frame('id' = rep(x$id, 2), 'from_state' = rep(1, 2), 'to_state' = 2:3,
'transition' = 1:2, 'Tstart' = rep(0, 2), 'Tstop' = x$Ctimes, 'status' = rep(0, 2),
'X' = x$X)
} else {
if (x$trueT02 < x$trueT01) {
x_new <- data.frame('id' = rep(x$id, 2), 'from_state' = rep(1, 2), 'to_state' = 2:3,
'transition' = 1:2, 'Tstart' = rep(0, 2), 'Tstop' = x$trueT02, 'status' = c(0, 1),
'X' = x$X)
} else {
if (x$Ctimes < x$trueT12) {
x_new <- data.frame('id' = rep(x$id, 3), 'from_state' = c(1, 1, 2), 'to_state' = c(2:3, 3),
'transition' = 1:3, 'Tstart' = c(rep(0, 2), x$trueT01),
'Tstop' = c(rep(x$trueT01, 2), x$Ctimes), 'status' = c(1, 0, 0),
'X' = x$X)
} else {
x_new <- data.frame('id' = rep(x$id, 3), 'from_state' = c(1, 1, 2), 'to_state' = c(2:3, 3),
'transition' = 1:3, 'Tstart' = c(rep(0, 2), x$trueT01),
'Tstop' = c(rep(x$trueT01, 2), x$trueT12), 'status' = c(1, 0, 1),
'X' = x$X)
}
}
}
}
data_mstate_split.by.id <- lapply(data_mstate_split.by.id, ms_arrange)
data_mstate <- do.call(rbind, data_mstate_split.by.id)
data_mstate$transition <- factor(data_mstate$transition)
Tstop <- tapply(data_mstate$Tstop, data_mstate$id, max)
Tstop <- Tstop[id]
DF <- DF[DF$time <= Tstop, ]
The data for the multi-state process need to be in the appropriate long format:
head(data_mstate, n = 5L)
#> id from_state to_state transition Tstart Tstop status X
#> 1.1 1 1 2 1 0.000000 4.392984 1 1.984455
#> 1.2 1 1 3 2 0.000000 4.392984 0 1.984455
#> 1.3 1 2 3 3 4.392984 9.094447 0 1.984455
#> 2.1 2 1 2 1 0.000000 1.552122 0 7.673231
#> 2.2 2 1 3 2 0.000000 1.552122 0 7.673231
for example, subject 1 experienced the following transition: \(1 \rightarrow 2\) and therefore is
represented in 3 rows, one for each transition, because all of these
transitions were plausible. On the other hand, subject 2 is only
represented by two rows, only for transitions \(1 \rightarrow 2\) and \(1 \rightarrow 3\) since these are the only
transitions possible from state 1. That is since subject 2 never
actually transitioned to state 2, transition \(2 \rightarrow 3\) was never possible, and
therefore, no row for this transition is in the dataset. It is also
important to note that the time in the dataset follows the counting
process formulation with intervals specified by Tstart
and
Tstop
and that there is a variable (in this case
transition
) indicating which transition the row corresponds
to.
When the data in the appropriate format are available, fitting the
model is very straightforward. First we fit a linear mixed model using
the lme()
function from package nlme:
mixedmodel <- lme(y ~ time + X, data = DF, random = ~ time | id)
Then, we fit a multi-state model using function coxph()
from package survival making sure we use the counting
process specification and that we add strata(transition)
to
stratify by the transition indicator variable in the dataset.
Furthermore, we add an interaction between covariate X
and
each transition to allow the effect of this covariate to vary across
transitions.
msmodel <- coxph(Surv(Tstart, Tstop, status) ~ X * strata(transition),
data = data_mstate)
Finally, to fit the joint model, we simply run:
jm_ms_model <- jm(msmodel, mixedmodel, time_var = "time",
functional_forms = ~ value(y):transition, n_iter = 10000L)
summary(jm_ms_model)
#>
#> Call:
#> jm(Surv_object = msmodel, Mixed_objects = mixedmodel, time_var = "time",
#> functional_forms = ~value(y):transition, n_iter = 10000L)
#>
#> Data Descriptives:
#> Number of Groups: 500 Number of events: 514 (39.1%)
#> Number of Observations:
#> y: 4033
#>
#> DIC WAIC LPML
#> marginal 17154.93 17140.12 -8770.862
#> conditional 18528.09 18226.79 -9522.077
#>
#> Random-effects covariance matrix:
#>
#> StdDev Corr
#> (Intr) 0.9430 (Intr)
#> time 0.7031 0.0698
#>
#> Survival Outcome:
#> Mean StDev 2.5% 97.5% P Rhat
#> X -0.1305 0.0703 -0.2670 0.0023 0.0546 1.0134
#> X:strata(transition)2 0.4071 0.1146 0.2022 0.6216 0.0000 1.4069
#> X:strata(transition)3 -0.0004 0.0841 -0.1709 0.1599 0.9884 1.0216
#> value(y):transition1 0.2642 0.0412 0.1854 0.3444 0.0000 1.0206
#> value(y):transition2 0.2251 0.0708 0.1076 0.3769 0.0000 1.5392
#> value(y):transition3 0.0099 0.0232 -0.0382 0.0525 0.6406 1.0066
#>
#> Longitudinal Outcome: y (family = gaussian, link = identity)
#> Mean StDev 2.5% 97.5% P Rhat
#> (Intercept) -0.4320 0.1049 -0.6369 -0.2284 0 1.0029
#> time 0.3299 0.0406 0.2507 0.4095 0 1.0113
#> X 1.5323 0.0200 1.4935 1.5714 0 1.0011
#> sigma 1.2269 0.0151 1.1978 1.2573 0 1.0007
#>
#> MCMC summary:
#> chains: 3
#> iterations per chain: 10000
#> burn-in per chain: 500
#> thinning: 1
#> time: 2.7 min
which differs from a default call to jm()
by the
addition of the functional_forms
argument specifying that
we want an “interaction” between the marker’s value and each transition,
which translates into a separate association parameter for the
longitudinal marker and each transition.