vignettes/Dynamic_Predictions.Rmd
Dynamic_Predictions.Rmd
In the setting of longitudinal data dynamic individualized predictions are predictions for the longitudinal outcomes that are updated as extra measurements are recorded for a subject.
We start by introducing the notation for this setting. Let \(y_i\) denote the \(n_i \times 1\) longitudinal response vector for the \(i\)-th subject (\(i = 1, \ldots, n\)), with element \(y_{il}\) denoting the value of the longitudinal outcome taken at time point \(t_{il}\), \(l = 1, \ldots, n_i\). We also let \(\mathcal D_n = \{y_i; i = 1, \ldots, n\}\) denote all available longitudinal outcome information in the original sample.
We are interested in deriving predictions for a new subject \(j\) from the same population who has provided is with a set of longitudinal measurements \(\mathcal Y_j(t) = \{ y_j(t_{jl}); 0 \leq t_{jl} \leq t, l = 1, \ldots, n_j \}\). Given the fact that we observed these measurements, it is more relevant to focus on conditional subject-specific predictions. In particular, for any time \(u > t\) we are interested in the conditional mean \[ \omega_j(u \mid t) = E \{ y_j(u) \mid \mathcal Y_j(t), \mathcal D_n\}, \quad u > t. \] The time-dynamic nature of \(\omega_j(u \mid t)\) is evident because when new information is recorded for subject \(j\) at time \(t' > t\), we can update these predictions to obtain \(\omega_j(u \mid t')\), and therefore proceed in a time-dynamic manner.
Calculation of these predictions proceeds more naturally under the (asymptotic) Bayesian paradigm. Namely, \(\omega_j(u \mid t)\) is the expected value of the posterior predictive distribution \[ p\{y_j(u) \mid \mathcal Y_j(t), \mathcal D_n\} = \int p\{y_j(u) \mid \mathcal Y_j(t), \theta\} \, p(\theta \mid \mathcal D_n) \; d\theta, \] where \(p(\theta \mid \mathcal D_n)\) denotes the posterior distribution of the parameters obtained from the original sample \(\mathcal D_n\). If we have fitted the model by maximum likelihood, as it is done in the GLMMadaptive package, we can approximate this posterior distribution by a multivariate normal distribution centered at the maximum likelihood estimates, and with variance-covariance matrix the inverse of the observed information matrix.
The first term in the integrand can be further expanded into: \[\begin{eqnarray*} p\{y_j(u) \mid \mathcal Y_j(t), \theta\} & = & \int p\{y_j(u) \mid b_j, \mathcal Y_j(t), \theta\} \, p \{b_j \mid \mathcal Y_j(t), \theta\} \; db_j\\ &&\\ & = & \int p\{y_j(u) \mid b_j, \theta\} \, p \{b_j \mid \mathcal Y_j(t), \theta\} \; db_j. \end{eqnarray*}\] By combining these integral equations, we can derive a Monte Carlo scheme to estimate \(\omega_j(u \mid t)\) and its confidence interval.
We start by simulating some data for a zero-inflated count longitudinal outcome from a negative binomial distribution:
set.seed(12345)
n <- 500 # number of subjects
K <- 10 # number of measurements per subject
t_max <- 4 # maximum follow-up time
# we construct a data frame with the design:
# everyone has a baseline measurement, and then measurements at random follow-up times
DF <- data.frame(id = rep(seq_len(n), each = K),
time = c(replicate(n, c(0, sort(runif(K - 1, 0, t_max))))),
sex = rep(gl(2, n/2, labels = c("male", "female")), each = K))
# design matrices for the fixed and random effects non-zero part
X <- model.matrix(~ sex * time, data = DF)
Z <- model.matrix(~ 1, data = DF)
# design matrices for the fixed and random effects zero part
X_zi <- model.matrix(~ sex, data = DF)
Z_zi <- model.matrix(~ 1, data = DF)
betas <- c(0.8, -0.5, 0.8, -0.5) # fixed effects coefficients non-zero part
shape <- 2 # shape/size parameter of the negative binomial distribution
gammas <- c(-2.5, 0.5) # fixed effects coefficients zero part
D11 <- 1.0 # variance of random intercepts non-zero part
D22 <- 0.8 # variance of random intercepts zero part
# we simulate random effects
b <- cbind(rnorm(n, sd = sqrt(D11)), rnorm(n, sd = sqrt(D22)))
# linear predictor non-zero part
eta_y <- as.vector(X %*% betas + rowSums(Z * b[DF$id, 1, drop = FALSE]))
# linear predictor zero part
eta_zi <- as.vector(X_zi %*% gammas + rowSums(Z_zi * b[DF$id, 2, drop = FALSE]))
# we simulate negative binomial longitudinal data
DF$y <- rnbinom(n * K, size = shape, mu = exp(eta_y))
# we set the extra zeros
DF$y[as.logical(rbinom(n * K, size = 1, prob = plogis(eta_zi)))] <- 0
To obtain a more accurate picture of which models predict the data better, we split the original dataset into a training and a test part, each having 250 subjects.
ids_train <- sample(500, 250)
DF_train <- DF[DF$id %in% ids_train, ]
DF_test <- DF[!DF$id %in% ids_train, ]
We fit two models to the DF_train
dataset: (1) a Poisson
mixed effects model with fixed effects the sex
,
time
and their interaction, and random intercepts, and (2)
a zero-inflated negative binomial mixed effects model with the same
fixed- and random-effects structure as in the Poisson model for the
negative binomial part, and the fixed effect of sex
and a
random intercept for the extra zeros part.
fm1 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF_train,
family = poisson())
fm2 <- mixed_model(y ~ sex * time, random = ~ 1 | id, data = DF_train,
family = zi.negative.binomial(),
zi_fixed = ~ sex, zi_random = ~ 1 | id)
Based on these fitted mixed models, we calculate predictions for the
DF_test
. To illustrate the concept of dynamic predictions,
we will use for each subject only his/her available measurements before
time point 2 to obtain estimates of their random effects. Then using
these estimated random effects, we want to obtain the expected counts
and their corresponding 95% confidence intervals, for both periods,
before and after time 2. We calculate these expected counts using the
predict()
method. As first argument we give the fitted
model, in the newdata
argument we provide the data based on
which the random effects of each subject will be estimated (i.e., using
the observed counts up to time 2), in the newdata2
argument
we give the data to be used for calculating ‘future’ predictions (i.e.,
predictions after time 2), in the type
argument we specify
that we want subject-specific predictions, and with the
se.fit
argument we request the standard errors and
confidence intervals. Finally, the return_newdata
specifies
that we want to obtain the expected counts and their confidence
intervals as extra columns of the newdata
and
newdata2
data.frames.
preds_fm1 <- predict(fm1, newdata = DF_test[DF_test$time < 2, ],
newdata2 = DF_test[DF_test$time >= 2, ],
type = "subject_specific",
se.fit = TRUE, return_newdata = TRUE)
preds_fm2 <- predict(fm2, newdata = DF_test[DF_test$time < 2, ],
newdata2 = DF_test[DF_test$time >= 2, ],
type = "subject_specific",
se.fit = TRUE, return_newdata = TRUE)
We will depict the dynamic predictions obtained from zero-inflated
negative binomial model fm2
. As a first step, for each
subject in the test dataset, we combine the predictions in the two
periods (i.e., before and after time 2):
splt_data_pre <- split(preds_fm2$newdata, preds_fm2$newdata$id)
splt_data_post <- split(preds_fm2$newdata2, preds_fm2$newdata2$id)
pred_data <- do.call("rbind", mapply(rbind, splt_data_pre, splt_data_post,
SIMPLIFY = FALSE))
Next we produce the figure of the dynamic predictions for 9 randomly selected subjects from the test dataset:
ids <- sample(pred_data$id, 9)
key <- list(space = "top", rep = FALSE, adj = 1,
text = list(c("Observed Counts")),
points = list(pch = 1, col = 1),
text = list(c("Expected Counts", "95% Confidence Interval")),
lines = list(lty = c(1, 2), col = c(2, 1)),
text = list("RE estimates\n info period"),
lines = list(lty = 3, col = 1))
xyplot(y + pred + low + upp ~ time | factor(id), data = pred_data,
subset = id %in% ids, layout = c(3, 3),
type = c("p", rep("l", 3)), distribute.type = TRUE,
lty = c(0, 1, 2, 2), col = c(1, 2, 1, 1), key = key,
abline = list(v = 2, lty = 3), scales = list(y = list(relation = "free")),
xlab = "Follow-up Time", ylab = "Counts")
As mentioned above, the interval shown in the figure of the dynamic predictions is a 95% confidence interval. If we wish a 95% prediction interval instead, a couple of extra steps are required. First, we need the quantile function (i.e., inverse cumulative distribution function) of the zero-inflated negative binomial distribution. Since this distribution is not one of the available distributions in the stats package, we write a function to compute it:
qzinbinom <- function (p, mu, shape, zi_probs) {
pnew <- (p - zi_probs) / (1 - zi_probs)
qnbinom(pmax(pnew, 0), mu = mu, size = shape)
}
As we see, to calculate quantiles we need the shape parameter and the
probabilities of each measurement being an extra zero. These are given
as the attribute zi_probs
of the pred
column
of the data.frame returned by the predict()
method, and
similarly, the lower and upper limits for these probabilities are given
as the attribute zi_probs
of the low
and
upp
columns returned by predict()
. Using
analogous code as above, we extract them and put them as an extra column
in the zi_probs
data.frame:
zi_probs_data <- data.frame(zi_probs = attr(preds_fm2$newdata$pred, "zi_probs"),
zi_probs_low = attr(preds_fm2$newdata$low, "zi_probs"),
zi_probs_upp = attr(preds_fm2$newdata$upp, "zi_probs"))
zi_probs_data2 <- data.frame(zi_probs = attr(preds_fm2$newdata2$pred, "zi_probs"),
zi_probs_low = attr(preds_fm2$newdata2$low, "zi_probs"),
zi_probs_upp = attr(preds_fm2$newdata2$upp, "zi_probs"))
splt_data_pre <- split(zi_probs_data, preds_fm2$newdata$id)
splt_data_post <- split(zi_probs_data2, preds_fm2$newdata2$id)
zi_probs <- do.call("rbind", mapply(rbind, splt_data_pre, splt_data_post,
SIMPLIFY = FALSE))
For the shape parameter, we extract the 95% confidence interval using
the confint()
method:
CI_shape <- confint(fm2, "extra")
Now we have all the components to calculate the lower and upper limit of the 95% prediction intervals, i.e.,
pred_data$lowPI <- qzinbinom(0.025, pred_data$low, CI_shape[, "2.5 %"],
zi_probs$zi_probs_upp)
pred_data$uppPI <- qzinbinom(0.975, pred_data$upp, CI_shape[, "97.5 %"],
zi_probs$zi_probs_low)
Using an analogous to the xyplot()
function from the
lattice we produce the plot
key <- list(space = "top", rep = FALSE, adj = 1,
text = list(c("Observed Counts")),
points = list(pch = 1, col = 1),
text = list(c("Expected Counts", "95% Prediction Interval")),
lines = list(lty = c(1, 2), col = c(2, 1)),
text = list("RE estimates\n info period"),
lines = list(lty = 3, col = 1))
xyplot(y + pred + lowPI + uppPI ~ time | factor(id), data = pred_data,
subset = id %in% ids, layout = c(3, 3),
type = c("p", rep("l", 3)), distribute.type = TRUE,
lty = c(0, 1, 2, 2), col = c(1, 2, 1, 1), key = key,
abline = list(v = 2, lty = 3), scales = list(y = list(relation = "free")),
xlab = "Follow-up Time", ylab = "Counts")
To evaluate the quality of the predictions we utilize the framework
of proper scoring
rules. The logarithmic, quadratic/Brier and spherical scoring rules
can be calculated for the dynamic predictions obtained from a fitted
mixed models using the function scoring_rules()
. This has a
similar syntax as the predict()
method that we used above
for dynamic predictions. Namely, as first argument we give the fitted
mixed model, and then we given the data.frame newdata
based
on which the random effects will be estimated. If newdata2
is not given, then the scoring rules are calculated for the predictions
obtained in newdata
. When newdata2
is also
provided, then the scoring rules are only calculated for the predictions
obtained in newdata2
. In the following piece of code, we
illustrate the use of this function for the two fitted mixed models:
scr_fm1 <- scoring_rules(fm1, newdata = DF_test[DF_test$time < 2, ],
newdata2 = DF_test[DF_test$time >= 2, ],
return_newdata = TRUE, max_count = 3000)
scr_fm2 <- scoring_rules(fm2, newdata = DF_test[DF_test$time < 2, ],
newdata2 = DF_test[DF_test$time >= 2, ],
return_newdata = TRUE, max_count = 3000)
The return_newdata
argument specifies that the
calculated scoring rules are returned as extra columns of the
newdata2
data.frame (or of the newdata
data.frame if only that one was provided). The max_count
argument is relevant for the case of count data we have here, and
denotes the positive integer up to which the infinite sample space of
the distribution will be cut.
With the next piece of code we combine the values of the scoring rules for the Poisson and zero-inflated negative binomial model into one dataset:
scr_fm2 <- scr_fm2[c("logarithmic", "quadratic", "spherical")]
names(scr_fm2) <- paste0(names(scr_fm2), "2")
scoring_data <- as.data.frame(c(scr_fm1, scr_fm2))
The following code produces the scatterplot of the spherical rule values for the two models along the period of prediction, i.e., from time 2 to 4. The loess curves are also superimposed.
key <- list(space = "top",
text = list(c("Poisson", "ZI Negative Binomial")),
lines = list(lty = c(1, 1), col = c(1, 2), lwd = c(2, 2)))
xyplot(spherical + spherical2 ~ time, data = scoring_data,
type = c("p", "smooth"), pch = ".", col = c(1, 2),
xlab = "Follow-up Time", ylab = "Spherical Scoring Rule",
key = key)
And with similar code we also produce the scatterplot of the logarithmic rule for the two models and the same period:
xyplot(logarithmic + logarithmic2 ~ time, data = scoring_data,
type = c("p", "smooth"), pch = ".", col = c(1, 2),
xlab = "Follow-up Time", ylab = "Logarithmic Scoring Rule",
key = key, ylim = c(-8, 0.5))
The spherical rule takes values in the interval \([0, 1]\), with values closer to 1 indicating a more accurate prediction model, and the logarithmic rule in the interval \((-\infty, 0]\), with values closer to 0 indicating a more accurate prediction model. We observe that the predictions obtained from the zero-inflated negative binomial mixed model are more accurate for the whole range of future time points from 2 to 4. In addition, we observe that the accuracy of the predictions diminishes the further away we move from time point 2 up to which information we have used for the calculation of the random effects.