GLMMadaptive: Mixed Models for Non-Normal Data in R

February 6

Motivation

Mixed models

Mixed effects models have numerous applications
- Longitudinal data
- Clustered data
- Multilevel data
- …

The basic idea

We have a grouped outcome \({\color{blue} Y}\)
- measurements in the same group(s) are correlated

Correlations are modeled via random effects \({\color{red} b}\)
- measurements in the same group share the same random effect

Mathematical definition - 1

A general definition of mixed models: \[ \left \{ \begin{array}{ccl} {\color{blue} Y} \mid {\color{red} b} & \sim & \mathcal F(\theta_y)\\&&\\ {\color{red} b} & \sim & \mathcal N(0, D)\\ \end{array} \right. \]
- the distribution \(\mathcal F\) depends on the outcome
- the random effects most often are assumed normally distributed
- \(\theta = (\theta_y^\top, \mbox{vech}(D)^\top)^\top\) are the model parameters

Mathematical definition - 2

To estimate mixed models under maximum likelihood we need the marginal log-likelihood function \[ \ell(\theta) = \sum_{i = 1}^n \log \int p(y_i \mid b_i; \theta) \, p(b_i; \theta) \; db_i \]

To estimate mixed models we need to solve the integral

Normal vs the rest - 1

The random effects \(b\) are assumed to have a (multivariate) normal distribution
When
- the outcome \(Y\) also has a normal distribution,
- the random effects enter linearly in the linear predictor

the integral can be solved analytically

\({\color{blue} \Rightarrow}\) Easy to program \({\color{blue} \Rightarrow}\) Stable algorithms available in most software

Normal vs the rest - 2

However, in all other cases these integrals cannot be solved analytically

\({\color{red} \Rightarrow}\) Much more difficult to program

\({\color{red} \Rightarrow}\) Several solutions available in software, however, many not optimal!!

Estimation

How to solve the integrals

Two general solutions

Approximation of the integrand
- make it look like a normal for which we know the solution
Approximation of the integral
- finite sum over a grid

Popular available options

Approximation of the integrand
- Penalized Quasi Likelihood
- Laplace approximation

Approximation of the integral
- adaptive Gaussian quadrature
- Monte Carlo (rejection & importance sampling)

Which to choose

Penalized Quasi Likelihood: generally not good

Laplace: needs many repeated measurements per subject
- it does not work well with binary data and Poisson with low counts

Adaptive Gaussian quadrature: Gold standard
- computationally intensive

Monte Carlo: requires fine tuning
- no clear winner algorithm

What is available in R

Mixed model packages in R - 1

lme4::glmer()
- Laplace approximation as default
- adaptive Gaussian quadrature only for scalar random effects
glmmTMB::glmmTMB() more flexible than lme4
- Laplace approximation
glmmsr::glmm()
- Monte Carlo importance sampling
MASS::glmmPQL()
- PQL based on nlme::lme()

Does it matter? - 1

The Toenail dataset data("toenail", package = "HSAUR2")
- mixed effects logistic regression
- PQL vs Laplace vs adaptive Gaussian quadrature

fm_PQL <- glmmPQL(outcome ~ time * treatment, random = ~ 1 | patientID, 
                  data = toenail, family = binomial())

fm_Laplace <- glmer(outcome ~ time * treatment + (1 | patientID), 
                  data = toenail, family = binomial())

fm_AGQ15 <-  glmer(outcome ~ time * treatment + (1 | patientID), 
                  data = toenail, family = binomial(), nAGQ = 15)

Does it matter? - 2

The Toenail dataset - results

cbind(PQL = fixef(fm_PQL), Laplace = fixef(fm_Laplace), AGQ_15 = fixef(fm_AGQ15))

##                                   PQL    Laplace     AGQ_15
## (Intercept)               -0.74324709 -2.5098642 -1.6473652
## time                      -0.29469102 -0.3997340 -0.3924231
## treatmentterbinafine      -0.03480231 -0.3048303 -0.1663130
## time:treatmentterbinafine -0.10017515 -0.1371352 -0.1371755

Does it matter? - 3

The Salamanders dataset data("Salamanders", package = "glmmTMB")
- mixed effects truncated Poisson
- (spoiler alert: Code)

gm_Laplace <- glmmTMB(count ~ spp + mined + (1 | site), ziformula = ~ spp + mined, 
                      family = truncated_poisson(), data = Salamanders)

gm_AGQ25 <- mixed_model(count ~ spp + mined, random = ~ 1 | site, 
                        zi_fixed = ~ spp + mined, 
                        family = hurdle.poisson(), data = Salamanders, nAGQ = 25)

Does it matter? - 4

The Salamanders dataset - results

cbind(Laplace = fixef(gm_Laplace)$cond, AGQ_25 = fixef(gm_AGQ25))

##                 Laplace     AGQ_25
## (Intercept) -0.06702286 -3.1900355
## sppPR       -0.52092708  0.1976119
## sppDM        0.22457540  0.9137361
## sppEC-A     -0.19548416  0.6538284
## sppEC-L      0.64672238  1.4999425
## sppDES-L     0.60513701  0.9589269
## sppDF        0.04602476  0.3426018
## minedno      1.01446593  0.7137289

Mixed model packages in R - 2

Need for adaptive Gaussian quadrature for \({\color{red} > 1}\) random effects

A new package

GLMMadaptive has been written to fill this gap

Principles
- User-friendly
- Satisfy the needs of most users
- Give advanced users the option to extend it
- Well documented

What GLMMadaptive offers

Model fitting

The basic model-fitting function is GLMMadaptive::mixed_model()

Required arguments
- fixed a formula for the fixed effects
- random a formula for the random effects
- data a data frame containing the variables to use
- family a family object specifying the model

Available models: Standard GLMMs

It fits the most frequently-used GLMMs
- mixed effects logistic regression family = binomial()
- mixed effects Poisson regression family = poisson()

A basic example:

mixed_model(fixed = y ~ time * group, random = ~ time | id, data = DF,
        family = binomial(), nAGQ = 15)

vignette("GLMMadaptive_basics", package = "GLMMadaptive")

Available models: Specialized GLMMs - 1

It fits some non-standard GLMMs
- negative binomial mixed model family = negative.binomial()
- Student's-t mixed model family = students.t()
- Beta mixed model family = beta.fam()

Available models: Specialized GLMMs - 2

Zero-inflated mixed models
- zero-inflated Poisson family = zi.poisson()
- zero-inflated negative binomial family = zi.negative.binomial()
Hurdle mixed models
- hurdle Poisson family = hurdle.poisson()
- hurdle negative binomial family = hurdle.negative.binomial()
- hurdle log-normal family = hurdle.lognormal()
- hurdle Beta family = hurdle.beta.fam()

vignette("ZeroInflated_and_TwoPart_Models", package = "GLMMadaptive")

User-defined models

The user can define its own mixed model using a user-defined family object
- specify log-density function
- specify derivatives (optional)

vignette("Custom_Models", package = "GLMMadaptive")

What the user needs - 1

summary(): Estimated coefficients, standard errors & p-values

anova(): Hypothesis testing
- Wald tests
- likelihood ratio tests
- AIC/BIC

What the user needs - 2

Effects Plots
- build-in effectPlotData()
- support for effects
- support for ggeffects

vignette("Methods_MixMod", package = "GLMMadaptive")

What the user needs - 3

Goodness of Fit
- support for DHARMa
- caveat: MAR missing data

vignette("Goodness_of_Fit", package = "GLMMadaptive")

What the user needs - 4

Multiple comparisons
- support for emmeans
- support for multcomp

vignette("Multiple_Comparisons", package = "GLMMadaptive")

What the user needs - 5

Predictions via predict() method
- average subject
- marginal
- subject-specific
- dynamic subject-specific
- confidence & prediction intervals
- proper scoring rules via scoring_rules()

vignette("Methods_MixMod", package = "GLMMadaptive")

vignette("Dynamic_Predictions", package = "GLMMadaptive")

What the user needs - 6

Methods for standard generics
- confint(), vcov() including the sandwich estimator
- coef(), fixef(), ranef()
- logLik(), nobs()
- fitted(), residuals()
- simulate()
For developers
- model.matrix(), model.frame()
- terms(), formula(), family()

A Few extras - 1

Interpretation
- with nonlinear link functions fixed effects have an interpretation conditional on the random effects [1]
- (very) often not what we want

[1] an explanation can be found at: https://stats.stackexchange.com/questions/365907/interpretation-of-fixed-effects-from-mixed-effect-logistic-regression

A Few extras - 2

Marginalized coefficients
- Heagerty et al. have proposed marginalized mixed models
  \({\color{red} \Rightarrow}\) Computationally intensive to fit
- Hedeker et al. recently proposed an easier alternative solution
- function marginal_coefs() implements Hedeker's et al. solution with standard errors

A Few extras - 3

Separation issues
- in logistic, Poisson and negative binomial data we may encounter (complete) separation [2]
- penalized argument of mixed_model() places a Student's t penalty on the fixed effects
- the user can control scale (default 1) and df (default 3)

[2] an example at: https://stats.stackexchange.com/questions/386553/linear-mixed-model-for-placement-of-nuclear-stress-in-10-word-turns

Documentation

Up-to-date help files

Dedicated website: https://drizopoulos.github.io/GLMMadaptive/

What the future holds…

Future plans

Implement nested random effects
- using Matrix sparse matrices classes

More options for the covariance matrix of the random effects
- auto-regressive
- compound symmetry

Extra models
- Conway-Maxwell-Poisson mixed model

Thank you for your attention!

http://www.drizopoulos.com/

These slides are available at:

https://drizopoulos.github.io/GLMMadaptive_presentations/