`vignettes/Dynamic_Predictions.Rmd`

`Dynamic_Predictions.Rmd`

In the setting of longitudinal data dynamic individualized predictions are predictions for the longitudinal outcomes that are updated as extra measurements are recorded for a subject.

We start by introducing the notation for this setting. Let \(y_i\) denote the \(n_i \times 1\) longitudinal response vector for the \(i\)-th subject (\(i = 1, \ldots, n\)), with element \(y_{il}\) denoting the value of the longitudinal outcome taken at time point \(t_{il}\), \(l = 1, \ldots, n_i\). We also let \(\mathcal D_n = \{y_i; i = 1, \ldots, n\}\) denote all available longitudinal outcome information in the original sample.

We are interested in deriving predictions for a new subject \(j\) from the same population who has provided is with a set of longitudinal measurements \(\mathcal Y_j(t) = \{ y_j(t_{jl}); 0 \leq t_{jl} \leq t, l = 1, \ldots, n_j \}\). Given the fact that we observed these measurements, it is more relevant to focus on conditional subject-specific predictions. In particular, for any time \(u > t\) we are interested in the conditional mean \[ \omega_j(u \mid t) = E \{ y_j(u) \mid \mathcal Y_j(t), \mathcal D_n\}, \quad u > t. \] The time-dynamic nature of \(\omega_j(u \mid t)\) is evident because when new information is recorded for subject \(j\) at time \(t' > t\), we can update these predictions to obtain \(\omega_j(u \mid t')\), and therefore proceed in a time-dynamic manner.

Calculation of these predictions proceeds more naturally under the (asymptotic) Bayesian paradigm. Namely, \(\omega_j(u \mid t)\) is the expected value of the posterior predictive distribution \[
p\{y_j(u) \mid \mathcal Y_j(t), \mathcal D_n\} = \int p\{y_j(u) \mid \mathcal Y_j(t),
\theta\} \, p(\theta \mid \mathcal D_n) \; d\theta,
\] where \(p(\theta \mid \mathcal D_n)\) denotes the posterior distribution of the parameters obtained from the original sample \(\mathcal D_n\). If we have fitted the model by maximum likelihood, as it is done in the **GLMMadaptive** package, we can approximate this posterior distribution by a multivariate normal distribution centered at the maximum likelihood estimates, and with variance-covariance matrix the inverse of the observed information matrix.

The first term in the integrand can be further expanded into: \[\begin{eqnarray*} p\{y_j(u) \mid \mathcal Y_j(t), \theta\} & = & \int p\{y_j(u) \mid b_j, \mathcal Y_j(t), \theta\} \, p \{b_j \mid \mathcal Y_j(t), \theta\} \; db_j\\ &&\\ & = & \int p\{y_j(u) \mid b_j, \theta\} \, p \{b_j \mid \mathcal Y_j(t), \theta\} \; db_j. \end{eqnarray*}\] By combining these integral equations, we can derive a Monte Carlo scheme to estimate \(\omega_j(u \mid t)\) and its confidence interval.

We start by simulating some data for a zero-inflated count longitudinal outcome from a negative binomial distribution:

set.seed(12345) n <- 500 # number of subjects K <- 10 # number of measurements per subject t_max <- 4 # maximum follow-up time # we constuct a data frame with the design: # everyone has a baseline measurment, and then measurements at random follow-up times DF <- data.frame(id = rep(seq_len(n), each = K), time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))), sex = rep(gl(2, n/2, labels = c("male", "female")), each = K)) # design matrices for the fixed and random effects non-zero part X <- model.matrix(~ sex * time, data = DF) Z <- model.matrix(~ 1, data = DF) # design matrices for the fixed and random effects zero part X_zi <- model.matrix(~ sex, data = DF) Z_zi <- model.matrix(~ 1, data = DF) betas <- c(0.8, -0.5, 0.8, -0.5) # fixed effects coefficients non-zero part shape <- 2 # shape/size parameter of the negative binomial distribution gammas <- c(-2.5, 0.5) # fixed effects coefficients zero part D11 <- 1.0 # variance of random intercepts non-zero part D22 <- 0.8 # variance of random intercepts zero part # we simulate random effects b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22))) # linear predictor non-zero part eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE])) # linear predictor zero part eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE])) # we simulate negative binomial longitudinal data DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y)) # we set the extra zeros DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0

To obtain a more accurate picture of which models predict the data better, we split the original dataset into a training and a test part, each having 250 subjects.

ids_train <- sample(500, 250) DF_train <- DF[DF$id %in% ids_train, ] DF_test <- DF[!DF$id %in% ids_train, ]

We fit two models to the `DF_train`

dataset: (1) a Poisson mixed effects model with fixed effects the `sex`

, `time`

and their interaction, and random intercepts, and (2) a zero-inflated negative binomial mixed effects model with the same fixed- and random-effects structure as in the Poisson model for the negative binomial part, and the fixed effect of `sex`

and a random intercept for the extra zeros part.

fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF_train, family = poisson()) fm2 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF_train, family = zi.negative.binomial(), zi_fixed = ~ sex, zi_random = ~ 1 | id)

Based on these fitted mixed models, we calculate predictions for the `DF_test`

. To illustrate the concept of dynamic predictions, we will use for each subject only his/her available measurements before time point 2 to obtain estimates of their random effects. Then using these estimated random effects, we want to obtain the expected counts and their corresponding 95% confidence intervals, for both periods, before and after time 2. We calculate these expected counts using the `predict()`

method. As first argument we give the fitted model, in the `newdata`

argument we provide the data based on which the random effects of each subject will be estimated (i.e., using the observed counts up to time 2), in the `newdata2`

argument we give the data to be used for calculating ‘future’ predictions (i.e., predictions after time 2), in the `type`

argument we specify that we want subject-specific predictions, and with the `se.fit`

argument we request the standard errors and confidence intervals. Finally, the `return_newdata`

specifies that we want to obtain the expected counts and their confidence intervals as extra columns of the `newdata`

and `newdata2`

data.frames.

preds_fm1 <- predict(fm1, newdata = DF_test[DF_test$time < 2, ], newdata2 = DF_test[DF_test$time >= 2, ], type = "subject_specific", se.fit = TRUE, return_newdata = TRUE) preds_fm2 <- predict(fm2, newdata = DF_test[DF_test$time < 2, ], newdata2 = DF_test[DF_test$time >= 2, ], type = "subject_specific", se.fit = TRUE, return_newdata = TRUE)

We will depict the dynamic predictions obtained from zero-inflated negative binomial model `fm2`

. As a first step, for each subject in the test dataset, we combine the predictions in the two periods (i.e., before and after time 2):

splt_data_pre <- split(preds_fm2$newdata, preds_fm2$newdata$id) splt_data_post <- split(preds_fm2$newdata2, preds_fm2$newdata2$id) pred_data <- do.call("rbind", mapply(rbind, splt_data_pre, splt_data_post, SIMPLIFY = FALSE))

Next we produce the figure of the dynamic predictions for 9 randomly selected subjects from the test dataset:

ids <- sample(pred_data$id, 9) key <- list(space = "top", rep = FALSE, adj = 1, text = list(c("Observed Counts")), points = list(pch = 1, col = 1), text = list(c("Expected Counts", "95% Confidence Interval")), lines = list(lty = c(1, 2), col = c(2, 1)), text = list("RE estimates\n info period"), lines = list(lty = 3, col = 1)) xyplot(y + pred + low + upp ~ time | factor(id), data = pred_data, subset = id %in% ids, layout = c(3, 3), type = c("p", rep("l", 3)), distribute.type = TRUE, lty = c(0, 1, 2, 2), col = c(1, 2, 1, 1), key = key, abline = list(v = 2, lty = 3), scales = list(y = list(relation = "free")), xlab = "Follow-up Time", ylab = "Counts")

As mentioned above, the interval shown in the figure of the dynamic predictions is a 95% confidence interval. If we wish a 95% prediction interval instead, a couple of extra steps are required. First, we need the quantile function (i.e., inverse cumulative distribution function) of the zero-inflated negative binomial distribution. Since this distribution is not one of the available distributions in the **stats** package, we write a function to compute it:

qzinbinom <- function (p, mu, shape, zi_probs) { pnew <- (p - zi_probs) / (1 - zi_probs) qnbinom(pmax(pnew, 0), mu = mu, size = shape) }

As we see, to calculate quantiles we need the shape parameter and the probabilities of each measurement being an extra zero. These are given as the attribute `zi_probs`

of the `pred`

column of the data.frame returned by the `predict()`

method, and similarly, the lower and upper limits for these probabilities are given as the attribute `zi_probs`

of the `low`

and `upp`

columns returned by `predict()`

. Using analogous code as above, we extract them and put them as an extra column in the `zi_probs`

data.frame:

zi_probs_data <- data.frame(zi_probs = attr(preds_fm2$newdata$pred, "zi_probs"), zi_probs_low = attr(preds_fm2$newdata$low, "zi_probs"), zi_probs_upp = attr(preds_fm2$newdata$upp, "zi_probs")) zi_probs_data2 <- data.frame(zi_probs = attr(preds_fm2$newdata2$pred, "zi_probs"), zi_probs_low = attr(preds_fm2$newdata2$low, "zi_probs"), zi_probs_upp = attr(preds_fm2$newdata2$upp, "zi_probs")) splt_data_pre <- split(zi_probs_data, preds_fm2$newdata$id) splt_data_post <- split(zi_probs_data2, preds_fm2$newdata2$id) zi_probs <- do.call("rbind", mapply(rbind, splt_data_pre, splt_data_post, SIMPLIFY = FALSE))

For the shape parameter, we extract the 95% confidence interval using the `confint()`

method:

CI_shape <- confint(fm2, "extra")

Now we have all the components to calculate the lower and upper limit of the 95% prediction intervals, i.e.,

pred_data$lowPI <- qzinbinom(0.025, pred_data$low, CI_shape[, "2.5 %"], zi_probs$zi_probs_upp) pred_data$uppPI <- qzinbinom(0.975, pred_data$upp, CI_shape[, "97.5 %"], zi_probs$zi_probs_low)

Using an analogous to the `xyplot()`

function from the **lattice** we produce the plot

key <- list(space = "top", rep = FALSE, adj = 1, text = list(c("Observed Counts")), points = list(pch = 1, col = 1), text = list(c("Expected Counts", "95% Prediction Interval")), lines = list(lty = c(1, 2), col = c(2, 1)), text = list("RE estimates\n info period"), lines = list(lty = 3, col = 1)) xyplot(y + pred + lowPI + uppPI ~ time | factor(id), data = pred_data, subset = id %in% ids, layout = c(3, 3), type = c("p", rep("l", 3)), distribute.type = TRUE, lty = c(0, 1, 2, 2), col = c(1, 2, 1, 1), key = key, abline = list(v = 2, lty = 3), scales = list(y = list(relation = "free")), xlab = "Follow-up Time", ylab = "Counts")

To evaluate the quality of the predictions we utilize the framework of proper scoring rules. The logarithmic, quadratic/Brier and spherical scoring rules can be calculated for the dynamic predictions obtained from a fitted mixed models using the function `scoring_rules()`

. This has a similar syntax as the `predict()`

method that we used above for dynamic predictions. Namely, as first argument we give the fitted mixed model, and then we given the data.frame `newdata`

based on which the random effects will be estimated. If `newdata2`

is not given, then the scoring rules are calculated for the predictions obtained in `newdata`

. When `newdata2`

is also provided, then the scoring rules are only calculated for the predictions obtained in `newdata2`

. In the following piece of code, we illustrate the use of this function for the two fitted mixed models:

scr_fm1 <- scoring_rules(fm1, newdata = DF_test[DF_test$time < 2, ], newdata2 = DF_test[DF_test$time >= 2, ], return_newdata = TRUE, max_count = 3000) scr_fm2 <- scoring_rules(fm2, newdata = DF_test[DF_test$time < 2, ], newdata2 = DF_test[DF_test$time >= 2, ], return_newdata = TRUE, max_count = 3000)

The `return_newdata`

argument specifies that the calculated scoring rules are returned as extra columns of the `newdata2`

data.frame (or of the `newdata`

data.frame if only that one was provided). The `max_count`

argument is relevant for the case of count data we have here, and denotes the positive integer up to which the infinite sample space of the distribution will be cut.

With the next piece of code we combine the values of the scoring rules for the Poisson and zero-inflated negative binomial model into one dataset:

scr_fm2 <- scr_fm2[c("logarithmic", "quadratic", "spherical")] names(scr_fm2) <- paste0(names(scr_fm2), "2") scoring_data <- as.data.frame(c(scr_fm1, scr_fm2))

The following code produces the scatterplot of the spherical rule values for the two models along the period of prediction, i.e., from time 2 to 4. The loess curves are also superimposed.

key <- list(space = "top", text = list(c("Poisson", "ZI Negative Binomial")), lines = list(lty = c(1, 1), col = c(1, 2), lwd = c(2, 2))) xyplot(spherical + spherical2 ~ time, data = scoring_data, type = c("p", "smooth"), pch = ".", col = c(1, 2), xlab = "Follow-up Time", ylab = "Spherical Scoring Rule", key = key)

And with similar code we also produce the scatterplot of the logarithmic rule for the two models and the same period:

xyplot(logarithmic + logarithmic2 ~ time, data = scoring_data, type = c("p", "smooth"), pch = ".", col = c(1, 2), xlab = "Follow-up Time", ylab = "Logarithmic Scoring Rule", key = key, ylim = c(-8, 0.5))

The spherical rule takes values in the interval \([0, 1]\), with values closer to 1 indicating a more accurate prediction model, and the logarithmic rule in the interval \((-\infty, 0]\), with values closer to 0 indicating a more accurate prediction model. We observe that the predictions obtained from the zero-inflated negative binomial mixed model are more accurate for the whole range of future time points from 2 to 4. In addition, we observe that the accuracy of the predictions diminishes the further away we move from time point 2 up to which information we have used for the calculation of the random effects.