The historical analysis framework for trial-based economic evaluations focusses on the modelling of individual-level effectiveness and cost variables $(e_i,c_i)$ collected at a given time or computed over some intervals, i.e. QALYs or Total costs. This because: 1) key quantities of interest, i.e. the mean parameters $(\mu_{e},\mu_{c})$ and related incremental quantities, can be easily derived from a bivariate model fitted to $(e_i,c_i)$; 2) when health economic outcomes are derived from longitudinal data $(e_{ij},c_{ij})$ collected at multiple occasions $j=1,\ldots,J$, the modelling of $(e_i,c_i)$ is considerably simplified compared to a multivariate model applied to $(e_{ij},c_{ij})$ .

However, when the longitudinal responses are subject to missingness, i.e. $e_{ij}=(e^{\text{obs}}_{ij},e^{\text{mis}}_{ij})$ and $c_{ij}=(e^{\text{obs}}_{ij},c^{\text{mis}}_{ij})$, a cross-sectional model is often inadequate since information from partially-observed responses will be ignored in the analysis. As a result, the specification of a joint model for $(e_{ij},c_{ij})$ provides an intuitive approach which, within a Bayesian framework, is a simple extension of the standard cross-sectional framework In addition, the model can be easily customised to: retrieve the estimates of interest while taking into account the evidence from any observed responses; conduct sensitivity analysis to alternative missingness assumptions.

When partially-observed longitudinal data are available, missingHE allows the user to implement a joint longitudinal model through the dedicated function lmdm, which mimics the modelling strategy of a longitudinal selection model. The function requires the specification of a model for the effectiveness (model.eff) and cost (model.cost) variables as well as a conditional model (i.e. mechanism) for the missing effectiveness (model.me) and costs (model.mc). The main difference compared to standard selection models is that, rather than directly modelling the missingness indicators, lmdm specifies a model for the longitudinal missing data pattern indicators $\boldsymbol r_{ie}=(m_{ije},\ldots,m_{iJe})$ and $\boldsymbol r_{ic}=(m_{ijc},\ldots,m_{iJc})$, where $m_{ije}$ and $m_{ije}$ denote the binary missingness indicators for the responses at time $j$. As an example, consider a partially-observed effectiveness variable $e_{ij}$ and let $\boldsymbol r_{ie}$ denote the corresponding longitudinal missingness patterns indicator. Although $r_{ie}$ could theory be associated with different values depending on the specific pattern combinations based on the values of $m_{ije}$, a possible way to simplify the specification of the model is to reduce the number of modelled patterns by aggregating some of the observed patterns together. This is the strategy implemented by missingHE to facilitate the implementation of a longitudinal missingness approach based on a restricted number of aggregated patterns. In particular, missingHE specifies the model using three distinguished types of missingness patterns and assigns each of them with a specific indicator value: 1) $\boldsymbol r_{ie}=1$ for the complete cases, i.e. $m_{ije}=0$ for any $j$; 2) $\boldsymbol r_{i2}=2$ for intermittent missingness, i.e. $m_{ije}=1$ and $m_{ile}=0$ for at least some time $l>j$; 3) $\boldsymbol r_{ie}=3$ for dropout, i.e. $m_{ije}=1$ and $m_{ile}=0$ for only $l<j$.

This tutorial shows how it to specify longitudinal models using the dedicated function lmdm in the package, while also showing the available customisation options to allow a flexible modelling specification. Throughout, we will use the built-in dataset called PBS as a toy example, which is directly available when installing and loading missingHE in your R workspace. See Introduction to missingHE for an introductory tutorial of each type of missingness model in missingHE.

If you would like to have more information on the package, or would like to point out potential issues in the current version, feel free to contact the maintainer at a.gabrio@maastrichtuniversity.nl. Suggestions on how to improve the package are also very welcome. A development version of the package is also available from the author’s main GitHub repository at https://github.com/AnGabrio/missingHE.

1 Modelling Framework

Trial-based health economic disaggregated data usually consist in the longitudinal cost-effectiveness pair of variables $(e_{ijt},c_{ijt})$, collected on the $i$-th patient at time $j$ assigned to the $t$-th treatment in the study, for $i=1,\ldots,N$ patients, $j=1,\ldots,J$ time points and $t=1,\ldots,T$ treatment options. In many jurisdictions the recommended effectiveness variable $e_{ijt}$ is a Health-Related Quality of Life (HRQoL) measure, typically in the form of utility scores computed from the patients’ answers to generic health-related quality of life questionnaires (e.g. EQ-5D) collected at given time intervals in the study. The cost variable $c_{ijt}$ denotes the sum of all relevant patient-level costs (e.g. direct/indirect medical costs, productivity losses, etc.), collected from the patients’ answers to different types of healthcare-resource utilisation questionnaires or medical records at different times in the study.

In missingHE, the observed variables $(e_{ijt},c_{ijt})$ are modelled using a joint bivariate probability distribution with density $p(e_{ijt},c_{ijt}\mid \boldsymbol \theta_{jt})$, as a function of a vector of relevant parameters $\boldsymbol \theta_{jt}$, and implemented using a series of conditional model factorisations. In particular, at $j>1$, the model can be represented as:

\[\begin{equation} p(e_{ijt},c_{ijt} \mid \boldsymbol \theta_{jt}) \quad = \quad p(e_{ijt} \mid e_{ij-1t},c_{ij-1t}, \boldsymbol \theta_{ejt})p(c_{ijt} \mid c_{ij-1t}, e_{ij-1t}, e_{ijt}, \boldsymbol \theta_{cjt}), \tag{1.1} \end{equation}\]

where $p(e_{ijt} \mid e_{ij-1t},c_{ij-1t}, \boldsymbol \theta_{ejt})$ denotes the conditional effectiveness distribution at time $j$ given past responses ($e_{ij-1t},c_{ij-1t}$) and $p(c_{ijt} \mid c_{ij-1t}, e_{ij-1t}, e_{ijt}, \boldsymbol \theta_{cjt})$ the conditional cost distribution at $j$ given current ($e_{ijt}$) and past responses ($e_{ij-1t},c_{ij-1t}$), with $\theta_{ejt}$ and $\theta_{cjt}$ being the treatment- and time-specific set of parameters indexing the effectiveness and cost distributions, respectively. To simplify notation, we now drop the treatment index $t$ from model parameters.

Both outcome densities in Equation (1.1) are specified using some parametric forms for the probability distributions to describe the underlying uncertainty in the cost-effectiveness data. The set of model parameters can be generally represented as $\boldsymbol \theta_j=(\boldsymbol \phi_j(x),\boldsymbol \psi_j(x))$, where: $\boldsymbol \phi_j(x)=(\phi_{je}(x),\phi_{jc}(x))$ denotes the set of location parameters (e.g. mean) and $\boldsymbol \psi_j(x)=(\psi_{je}(x),\psi_{jc}(x))$ the set of ancillary parameters (e.g. variance) indexing the outcome distributions, both depending on some vector of potential baseline covariates $x$ (e.g. age, sex, etc…). While it is possible for both $\boldsymbol \phi_j$ and $\boldsymbol \psi_j$ to explicitly depend on $x$, the model specification is usually simplified assuming that these only affect the location parameters through a generalised linear formulation. For example, the location parameter of the effectiveness model can be specified as:

\[\begin{equation} g_e(\phi_{ije}) \quad = \quad \alpha_{j0} + \sum_{k=1}^K \alpha_{jk}x_{ik} + \alpha_{je} e_{ij-1} + \alpha_{jc} c_{ij-1} \,\,[+ \ldots], \tag{1.2} \end{equation}\]

and the location parameter of the cost model as:

\[\begin{equation} g_c(\phi_{ijc}) \quad = \quad \beta_{j0} + \sum_{k=1}^K \beta_{jk}x_{ik} + \beta_{jf} e_{ij} + \beta_{je} e_{ij-1} + \beta_{jc} c_{ij-1} , \,\,[+ \ldots]. \tag{1.3} \end{equation}\]

where the regression coefficients $(\alpha_{je},\alpha_{jc})$ and $(\beta_{jf}, \beta_{je},\beta_{jc})$ are included to capture the possible time dependence between $c_{ij}$ and $e_{ij}$ at the location level. Note that missingHE, through the argument time_dep inside the lmdm function, allows the user to choose among three alternative specifications of Equation (1.2) and Equation (1.3) based on different assumptions about the response time dependence: 1) "none" requires no time dependence by setting $\alpha_{je}=\alpha_{jc}=\beta_{je}=\beta_{jc}=\beta_{jf}=0$; 2) "biv" requires dependence only between outcomes at the same time by setting $\alpha_{je}=\alpha_{jc}=\beta_{je}=\beta_{jc}=0$;l 3) "AR1" requires an autoregressive of order one structure by not imposing any further restrictions on the regression parameters.

Covariates may be included into the model at different scales using the link functions $g_e(\cdot)$ and $g_c(\cdot)$, being either identity, logarithmic or logit, depending on the type of outcome and distribution specified. Note that missingHE expands any categorical covariates to a set of dummy variables. In addition, Equation (1.2) and Equation (1.3) can be extended, as indicated by the term $[+ \ldots]$, to include additional terms into the model, e.g. clustering effects to account for the possible multilevel structure of the data. The objective of the statistical analysis is the estimation of population average time-specific effectiveness and cost parameters $\boldsymbol \mu=(\mu_{je},\mu_{jc})$ in each treatment group and to quantify the (posterior) uncertainty around them. Estimates of these parameters can be obtained from Equation (1.2) and Equation (1.3) either from their respective regression parameters after mean centring all covariates or through repeatedly sampling from the predictive distribution of the outcomes using Monte Carlo integration. Finally, estimates of aggregated mean parameters computed over the entire study period are usually obtained through linear combination of the time-specific average estimates. For instance, in the case of QALYs, aggregated mean response estimates can be typically derived by applying an Area Under the Curve (AUC) approach:

\[\begin{equation} \mu_e = \sum_{j=1}^J \mu_{je} \times w_{je}, \tag{1.4} \end{equation}\]

where $w_{je}=\frac{\gamma_j}{2}$ denotes the weight used in the AUC calculation, with $\gamma_j$ being the percentage of the time unit (often $1$ year) covered between the time interval $j-1$ and $j$. As an example, if responses were collected at equally-spaced time intervals, say every $6$ months, over a one-year study period, then $w_{je}=\frac{\gamma_j}{2}=\frac{6/12}{2}=0.25$ for any $j$. In the case of Total costs, aggregated mean response estimates can be typically derived as:

\[\begin{equation} \mu_c = \sum_{j=1}^J \mu_{jc} \times w_{jc}, \tag{1.5} \end{equation}\]

where setting $w_{jc}=1$ would correspond to a standard unweighted sum. Note that, after missingHE derives estimates for $(\mu_{je},\mu_{jc})$ from Equation (1.2) and Equation (1.3), it automatically computes the aggregated estimates $(\mu_e,\mu_c)$ using the approaches described in Equation (1.4) and Equation (1.5) assuming default values of $w_{je}=0.5$ and $w_{jc}=1$ for all $j$. However, the function lmdm allows the user to modify these values through the optional arguments w_e and w_c, which can be provided either as single values (applied to each $j$) or as vectors of length $J$. This level of customisation allows the user to directly derive aggregated mean cost-effectiveness estimates and use them in the computation of all CEA-specific output when using lmdm.

Assuming that the location parameters are modelled using a linear predictor form as in Equation (1.2) and Equation (1.3), priors on the corresponding sets of regression coefficients can be specified as $\boldsymbol \alpha_j=(\alpha_{j0},\ldots,\alpha_{jK},\alpha_{je},\alpha_{jc}) \overset{\text{iid}}{\sim} \text{Normal}(\mu_{j\alpha},\sigma_{j\alpha})$ and $\boldsymbol \beta=(\beta_{j0},\ldots,\beta_{jK},\beta_{jf},\beta_{je},\beta_{jc}) \overset{\text{iid}}{\sim} \text{Normal}(\mu_{j\beta},\sigma_{j\beta})$, where the notation $\mu_{j\cdot}$ and $\sigma_{j\cdot}$ denotes the time-specific prior mean and standard deviation, respectively. By default, missingHE assumes the “minimally informative” prior values of $\mu_{j\cdot}=0$ and $\sigma_{j\cdot}=1000$ for all regression coefficients. When genuine prior knowledge on specific parameters is available, it is possible to modify the default priors and elicit the corresponding information into the model. As for the ancillary parameters, the choice of the prior values depends on the form of the probability distribution selected according to the modelling choices available in missingHE.

1.1 Missingness Approach

missingHE specifies the model for the aggregated missingness patterns ($\boldsymbol r_{ie}, \boldsymbol r_{ic}$) through a selection model approach, where the pattern-specific probabilities $(\pi^{r}_{ie},\pi^{r}_{ic})$ are estimated using a multinomial (log) regression, conditional on possibly fully-observed baseline covariates $X_i$ and partially-observed outcomes ($e_{ij},c_{ij}$). For example, the missingness effectiveness model is specified as

\[\begin{aligned} \boldsymbol r_{ie} & \sim \text{Multinomial}(\pi_{ie}, 1), \\ \pi^r_{ie} =& \frac{\iota^{r}_{ie}}{\sum_{l=1}^3 \iota^{l}_{ie}}, \\ \log(\iota^{r}_{ie})&=\gamma^r_{j0e} + \sum_{k=1}^K \gamma^r_{jke} x_{ik} + \delta^r_{je} e_{ij} \,\,[+ \ldots], \\ \tag{1.6} \end{aligned}\]

where the upperscript $r$ indicates the pattern category (1=completers, 2=intermittent, 3=dropout), $\boldsymbol r_{ie}$ is a vector with entries set to one if $m^r_{ije}=1$ and zero otherwise, $\boldsymbol \gamma^r_j = (\gamma^r_{j0e}, \ldots, \gamma^r_{jKe})$ is the set of time- and pattern-specific regression coefficients, while $\delta^r_{je}$ is the time- and pattern-specific MNAR parameters. Additional terms, e.g. clustering effects, may also be included in the model as indicated by the term $[+ \ldots]$. Note that an equivalent specification to Equation (1.6) for the missingness model of the longitudinal costs can be specified by missingHE through the function lmdm.

According to Equation (1.6), it is clear to see that alternative assumptions about the missingness mechanism can be encoded into the model based on the type of variables included. In particular, we have: a Missing Completely At Random (MCAR) assumption when all regression coefficients, except $\gamma^r_{j0e}$, are zero; a Missing At Random (MAR) assumption when $\delta^r_{je}=0$ and one or more coefficients among $(\gamma^r_{j1e}, \ldots, \gamma^r_{jKe})$ is/are not zero; a Missing Not At Random (MNAR) when $\delta^r_{je} \neq 0$. Under MNAR, due to the missing values in $e_{ij}$, the model can only be identified by imposing some restrictions. These are implicitly defined from a combination of parametric assumptions about $p(e_{ij} \mid e_{ij-1}, c_{ij-1}, \boldsymbol \theta_j)$ and dependence structure assumed for $p(\boldsymbol r_{ie} \mid e_{ij},\boldsymbol\eta_j)$. As a result, sensitivity analysis to assess the impact of alternative missingness assumptions on the inferences can be done by: varying the distributional assumption of $p(e_{ij} \mid e_{ij-1}, c_{ij-1}, \boldsymbol \theta_j)$; varying the modelling structure in Equation (1.6); and, within a Bayesian setting, varying the prior distribution on $\delta^r_{je}$.

1.2 The PBS trial

In the following, we use data from a real clinical study as a running example to illustrate how a longitundal missingness model for trial-based economic evaluations can be specified using the function lmdm from missingHE. The PBS was a multicentre randomised controlled trial conducted on people suffering from intellectual disability and challenging behaviour. A total of $244$ individuals were enrolled in the trial, $136$ in the Treatment As Usual (TAU) and $108$ in the treatment of care plus the Positive Behavioural Support intervention (PBS). For all individuals, key health economic outcomes were collected via self-reported questionnaires (i.e. EQ-5D and CSRI) at baseline, $6$ and $12$ months follow-ups.

After loading the package missingHE, for example by typing library(missingHE), the PBS dataset (PBS) is directly imported into the R workspace and can be inspected using the command

> rbind(head(PBS), tail(PBS))

to show the first and last few rows:

#>   id time           e       c age gender ethnicity living carer marital disability site trt
#>    1    1  0.17300001  9214.0  31      0         1      3     0       0          3    7   1
#>    2    1  0.85000002  2492.5  51      0         1      1     1       0          2    7   1
#>    3    1 -0.16599999 15283.0  53      1         1      1     1       0          1    5   1
#>    4    1  0.25800008  1858.0  30      0         1      3     0       0          2    5   1
#>    5    1  0.03800001   797.0  46      0         1      3     1       0          2    5   1
#>    6    1  1.00000000   358.0  44      0         0      3     1       0          1    5   1
#>  239    3          NA      NA  33      1         1      2     0       0          3    4   1
#>  240    3  0.81500006   150.0  33      0         0      3     1       0          2   15   1
#>  241    3  0.45200008  1740.0  52      0         0      3     1       0          3   22   2
#>  242    3  0.88300002   105.0  32      0         1      3     1       0          3   23   1
#>  243    3  0.81500006  2372.0  50      0         0      1     1       0          3    2   2
#>  244    3  0.81500006  1630.0  27      1         0      2     0       0          2    2   2

The dataset consists of $244$ individuals grouped in two arms, where trt$=1$ indicates the TAU and trt$=2$ indicates the MenSS intervention, with id and time denoting the participant identification number and measurement occasion, respectively.

Key health economic variables are e and c, which denote the HRQoL utility and cost longitudinal outcomes, respectively. All baseline variables are fully-observed, whereas all outcome variables are partially-observed in both treatment arms. There is a total of 679 (93%) outcome observations, of which 371 in the TAU and 308 in the PBS arm, while the remaining 53 observations are missing. Figure 1.1 and Figure 1.2 show the histograms of the observed distributions for the utility and cost variables in PBS, separately by treatment arm and time point.

Figure 1.1: Histograms of the observed data distributions for the utilities in the TAU and PBS arm by time point.

Figure 1.2: Histograms of the observed data distributions for the costs in the TAU and PBS arm by time point.

1.3 Model fitting

missingHE relies on JAGS through the dedicated function lmdm to implement the longitudinal missingness approach while providing a series of customisation options. In particular, key options include: choice of the probability distribution for the effectiveness and cost variables, choice of the regression model for each outcome, and type of missingness mechanism assumed.

A set of pre-defined choices in terms of probability distributions, with built-in parameterisations, for the effectiveness (e) and/or cost (c) variables are available. The user may select a given distributional choice by passing the corresponding missingHE name to the arguments dist_e and dist_c of the desired function. The outcome model $p(e,c)$ requires a separate specification of the effectiveness and cost model formula using the notation:

model.eff <- e ~ trt + ...
model.cost <- c ~ trt + ...

where e, c and trt indicate the effectiveness, cost and treatment variable names stored in the data set, while ... is a suitable form for the combination of covariates to be used in each model. Dependence between the two models can be specified by adding the term e as an additional covariate in the cost model and by specifying the type of time dependence structure assumed for the model through the argument time_dep. Currently available choices for this argument are: 1) "none" for complete independence, which also requires e to not be included in model.cost; 2) "biv for bivariate dependence at each time, where dependence between $c_j$ and $e_j$ is only allowed at the same time $j$; 3) "AR1 for order one autoregressive structure as specified in Equation (1.2) and Equation (1.3). Note that the treatment indicator trt must be included into both regression formulae and should be coded as a factor variable in the data set, while the time indicator time should not be included in the formulae and should be coded as a numeric (integer) variable. The type of missingness assumption for each outcome is specified through the argument type, by setting type = "MAR" or type = "MNAR" to select between a MAR or MNAR mechanism, respectively.

As an example, a suitable call to lmdm is the following.

fit.model <- lmdm(data = data, model.eff = model.eff, 
  model.cost = model.cost, dist_e = dist_e, dist_c = dist_c, 
  type = type, time_dep = time_dep, ...)

where data is a dataframe containing the data, model.eff and model.cost are the formulae for the effectiveness and cost regression models, dist_e, dist_c, type and time_dep are character values indicating the chosen outcome distributions, type of missingness mechanism and time dependence, while ... indicates additional optional arguments. In practice, unless there is a clear understanding of the implications of any change to the default model structure, the user is not required to modify the default values of these optional arguments. The only exception is related to: number of MCMC chains (n.chains), number of iterations (n.iter) and values of the parameters of the prior distributions (prior). As an example, consider the default priors for the linear predictor coefficients in the effectiveness and cost model, namely $\boldsymbol \beta_j \overset{\text{iid}}{\sim}\text{Normal}(\mu_{j\beta},\sigma_{j\beta})$ and $\boldsymbol \alpha_j \overset{\text{iid}}{\sim}\text{Normal}(\mu_{j\alpha},\sigma_{j\alpha})$. Suppose the user wants to select different prior mean and standard deviation values for these parameters. This can be done by typing

myprior <- list("beta.prior" = c("norm", 0, 0.01), "alpha.prior" = c("norm", 0, 0.01))

which instructs missingHE to separately assign normal distributions to each regression coefficient in the models with prior mean of $0$ and precision (inverse of variance) of $0.01$. The list myprior can be then passed to the optional prior argument in the chosen model fitting function to implement the model using the updated priors.

fit.model <- lmdm(data = data, model.eff = model.eff, 
  model.cost = model.cost, dist_e = dist_e, dist_c = dist_c, 
  type = type, time_dep = time_dep, prior = myprior)

Note that the list and its elements containing the custom prior specification need to be named using the missingHE accepted terminology for parameter and distribution names. If the option save_model is set to TRUE, the JAGS model template is saved in the current working directory and can be used to further modify the model structure with a higher degree of flexibility. This, however, will require a new compilation of the model and, at present, the new model has to be run using dedicated commands, such as rjags from the R2jags package.

1.4 Fitting models under MAR

To ease presentation, we consider the following common model specification: e and c are both normally distributed with no covariates included in their models; a MAR mechanism is assumed for both outcomes, and a bivariate structure is assumed for the time dependence. For demonstrative purposes and to ease presentation of results, in this tutorial we only consider models fitted under MAR. See Fitting MNAR models in missingHE for an overview about how informative missingness models can be fitted in missingHE.

The model is implemented using the function lmdm using the following command.

> lmdm1_mar <- lmdm(data = PBS, dist_e = "norm", dist_c = "norm",
+              model.eff = e ~ trt, model.cost = c ~ trt + e,
+              model.me = me ~ 1, model.mc = mc ~ 1, time_dep = "biv",
+              type = "MAR", n.iter = 1000, ref = 2)

The arguments model.me and model.mc denote the missingness mechanism as in Equation (1.6) for both outcomes. Since no covariates are included in model.me and model.mc, the utility and cost mechanisms are assumed MAR (type="MAR") conditional on the observed outcome values. The argument ref = 2 is used to specify which study arm should be used as the reference group in the comparison, i.e. incremental results are reported here as trt=2 ("PBS") vs trt=1 ("TAU").

1.5 Model assessment

Prior to inspecting the posterior results, it is important to assess model performance and convergence of the MCMC algorithm. missingHE provides three functions to compute and implement different types of popular Bayesian model assessment measures and plots, namely diagnostic, ppc and pic.

1.5.1 Convergence diagnostics

The function diagnostic provides a range of standard diagnostic tools and plots for assessing convergence based on the posterior distribution of a Bayesian model fitted in missingHE. Consider the output of the model stored in the object lmdm1_mar; we may use the command

> diagnostic(x = lmdm1_mar, type = "denplot", param = "beta.f")

to produce “density plots” for the regression parameters associated with the effectiveness variable within the cost models. In this case, as displayed in Figure 1.3, no major issues in convergence are identified for the parameters monitored at each time point.

Checking convergence using the diagnostic function for a model fitted in missingHE, for example through inspection of the density plots.

Figure 1.3: Checking convergence using the diagnostic function for a model fitted in missingHE, for example through inspection of the density plots.

With the exception of the argument x, which must be an object of class "missingHE", all other arguments can be used to customise the output of the function. A full description of the choices for each argument, including the available types of diagnostic tools and families of parameters, is provided in the help file of the package. Note that the output produced by diagnostic is effectively a ggplot2 object, and can therefore be manipulated by the user using functions from packages which allow to customise such objects.

1.5.2 Posterior predictive checks

The function ppc provides a range of graphical posterior predictive checks for assessing the fit of a Bayesian model implemented in missingHE. Consider the output of the model fitted in Section 1.4 and stored in the object lmdm1_mar; we may use the command

> ppc(x = lmdm1_mar, type = "histogram", outcome = "effects", 
+     ndisplay = 2, time.plot = 3, trt = "2")

to plot the densities of ndisplay$=5$ replications of the effectiveness data (light blue lines) with respect to the density of the observed data (dark blue line) in the PBS (trt=2) arm. The argument time.plot indicates for which times the comparison between observed and replicated data should be plotted, here the third time point is selected (default$=1$). Figure 1.4 shows some discrepancies between the replicated and observed data histograms, likely due to the relatively poor fit of the current model (i.e. the normality assumption of e) to the observed data in both arms.

Posterior predictive graphical checks using the ppc function for a model fitted in missingHE, for example through inspection of histograms.

Figure 1.4: Posterior predictive graphical checks using the ppc function for a model fitted in missingHE, for example through inspection of histograms.

A full description of the choices for each argument in ppc, including the available types of posterior predictive checks, is provided in the package help file. Note that the arguments in ppc may also be used to generate graphs for any combination of treatment arms and time points available in the data set: for example, graphs corresponding to those shown in Figure 1.4 for only the PBS arm at all time points may be requested by setting trt=2 and time.plot="all".

1.5.3 Predictive information criteria

The function pic allows to assess the relative predictive accuracy of models fitted in missingHE by computing alternative Bayesian information criteria. As an example, consider again the model results stored in the object lmdm1_mar; we may use the command

> pic_lmdm1_mar <- pic(x = lmdm1_mar, criterion = "looic", cases = "cc")

to return a named list containing a series of components used in the calculation of the selected information criterion. In this case, leave-one-out information criterion (LOOIC) was selected as specified in criterion = "looic" based only on the complete cases in the data set (cases = "cc"), and its value may be inspected by typing:

> pic_lmdm1_mar$looic
#> [1] 14260.39

Note that all information criteria in missingHE can only be used as relative measures of fit to compare nested models, with lower values typically indicating better fit. In addition, predictive information criteria are ideally evaluated based on the observed data alone (or after integrating out missingness under MAR). As a result, computing these measures based on the complete cases provides a “safe” choice when comparing the fit of alternative models. pic allows alternative choices for the argument cases, whose appropriateness should be considered by the user based on the context at hand (e.g. amount and type of missing data). Current choices are: complete cases (cc), available cases for the effectiveness (ac_e), available cases for the costs ("ac_c") or all cases (all).

Finally, pic allows to compute information criteria for multiple missingHE objects by simply replacing the argument x with a list of models generated using the functions in the package. For example, suppose an additional longitudinal model was fitted using lmdm and its output stored in an object named lmdm2_mar, which additionally includes some covariates into the effectiveness and cost models. Then, the desired predictive information criteria may be computed for both models using pic by typing:

> pic_compare_mar <- pic(x = list(lmdm1_mar, lmdm2_mar), 
+                        criterion = "looic", cases = "cc")

A full description of the different types of output associated and input choices is provided in the package help file.

1.6 Results inspection

1.6.1 Summarise parameter estimates

Summary statistics computed based on posterior samples of these parameters, directly generated from JAGS, may be accessed using the print function by typing

> print(lmdm1_mar, display = "fixed")

which returns the following output

#>                         mean                sd            2.5%             50%            97.5% Rhat n.eff
#> alpha[1,1]      0.4774668421     0.03380072292    0.4110906197    0.4770403924     0.5465300005 1.00   500
#> alpha[2,1]      0.0823233443     0.04819621634   -0.0027014364    0.0826185540     0.1767845944 1.01   620
#> alpha[1,2]      0.4972297353     0.03180654387    0.4371215472    0.4973842840     0.5580172326 1.00  1000
#> alpha[2,2]      0.1407143544     0.04704184630    0.0512524414    0.1407080422     0.2336346719 1.00  1000
#> alpha[1,3]      0.4815649635     0.02839279600    0.4284286963    0.4808657016     0.5396731139 1.00  1000
#> alpha[2,3]      0.1340213198     0.04291335036    0.0496195939    0.1347396022     0.2153194104 1.00  1000
#> beta[1,1]      55.8801232096    30.93797510432   -3.5083567248   55.1339194620   118.9292774609 1.00   410
#> beta[2,1]      31.8525165271    32.79161522491  -34.6641161004   32.0659359662    94.7879942718 1.00  1000
#> beta[1,2]      54.1026243962    30.82668600636   -3.9985523628   53.0629597418   116.0963264010 1.01   250
#> beta[2,2]      32.4573961314    32.43437566851  -31.2618949904   32.7979283996    95.9260302043 1.00  1000
#> beta[1,3]      33.5200670968    32.96955496410  -28.9057654214   33.4589265091    98.3417450972 1.00   470
#> beta[2,3]      20.9058572513    31.36909924259  -42.1009544941   21.3479370921    82.5852314087 1.00   890
#> beta_f[1]      -5.7891325531    31.49652565848  -65.4214939266   -5.0856971717    57.3903603120 1.00  1000
#> beta_f[2]      -4.8216195738    31.76781481050  -66.8637831107   -3.7259816613    58.4801359963 1.00  1000
#> beta_f[3]      -1.7033256676    32.18699504673  -63.9998491558   -1.0847655263    62.2647961685 1.00  1000
#> gamma_c[1,1]    0.3892001375     4.80566027292   -7.0289400317   -0.0842289872     8.1938062478 5.44     2
#> gamma_c[2,1]    1.7357543820     3.05687857590   -4.7223788144    1.9558154909     7.8444565221 2.05     4
#> gamma_c[3,1]    6.7842290179     1.68799125337    3.7667998222    6.6789926021    10.1135695099 2.08     4
#> gamma_c[1,2]   -3.5281665679     4.85820548964  -11.1930217734   -3.9439654599     4.2925042251 5.33     2
#> gamma_c[2,2]   -2.1982841776     3.05567186294   -8.6530217312   -1.9080739798     3.7906949663 2.02     4
#> gamma_c[3,2]    2.8523286132     1.71421721196   -0.2915445735    2.8401813299     6.1365560590 2.01     4
#> gamma_c[1,3]   -2.8024048971     4.79671556271  -10.2581423325   -3.1480443413     4.9508789234 5.41     2
#> gamma_c[2,3]   -1.4228341578     3.07602922063   -7.9847199576   -1.2199940209     4.7386811113 2.07     4
#> gamma_c[3,3]    3.6047612796     1.70267111353    0.5149599796    3.5054274160     6.8232404139 2.11     4
#> gamma_e[1,1]    0.8150196578     2.16972513038   -2.6427374326    0.8276232820     4.2511462979 4.37     2
#> gamma_e[2,1]    2.4227408196     5.31335071885   -6.4992830836    2.7930650798     9.7891078125 6.02     2
#> gamma_e[3,1]    0.6300398583     1.54766408810   -1.9859534233    0.4148844451     3.7290779723 1.19    21
#> gamma_e[1,2]   -1.3460382381     2.17608676088   -4.8331382883   -1.3733031833     2.1643234720 4.35     2
#> gamma_e[2,2]    0.2587079759     5.31727278384   -8.5835569429    0.5532729047     7.6054500782 6.02     2
#> gamma_e[3,2]   -1.5354436976     1.54703145428   -4.1513881918   -1.7482063226     1.5639189690 1.19    20
#> gamma_e[1,3]   -1.7517462037     2.18515791352   -5.1906234911   -1.7476365488     1.7490969828 4.36     2
#> gamma_e[2,3]   -0.1336904469     5.30205292333   -9.0241878338    0.3343639043     7.1452839135 6.02     2
#> gamma_e[3,3]   -1.9259072805     1.55474682498   -4.6368336989   -2.1480462627     1.1660792849 1.18    21
#> p_c[1,1]      321.9998105963  1187.91814385731    0.0008858712    0.9456631638  3618.9688442115 5.44     2
#> p_c[2,1]      223.1159658911   752.53977783569    0.0088939972    7.0696830786  2551.5661106564 2.05     4
#> p_c[3,1]     3695.4605467947 10003.16181877702   43.2418123845  795.5228446557 24678.8945006288 2.12     4
#> p_c[1,2]        6.9718580242    26.67014461876    0.0000137700    0.0193997403    73.1532498774 5.33     2
#> p_c[2,2]        4.0933919221    13.78750579011    0.0001745985    0.1483658799    44.2871694509 2.02     4
#> p_c[3,2]       76.6826372324   235.52573440843    0.7471278844   17.1188759260   462.4582472510 2.01     4
#> p_c[1,3]       12.8281203862    44.92816670619    0.0000350708    0.0430168983   141.3000176937 5.41     2
#> p_c[2,3]        9.8802803557    32.74618426726    0.0003406282    0.2952340097   114.2833932172 2.07     4
#> p_c[3,3]      153.8114143843   422.26466292816    1.6735991703   33.2957079634   918.9884049152 2.11     4
#> p_e[1,1]       12.3712285224    20.95564815264    0.0711666058    2.2878800169    70.1858386627 4.37     2
#> p_e[2,1]     2234.4101245845  4737.51187633998    0.0015045189   19.2790190764 17838.3846678721 6.02     2
#> p_e[3,1]        5.9898856194    10.92409792715    0.1372496947    1.5141958146    41.6419673420 1.19    21
#> p_e[1,2]        1.4546690222     2.56525235470    0.0079615531    0.2532690331     8.7089487981 4.35     2
#> p_e[2,2]      257.0214869707   544.66726939203    0.0001871613    2.1045256915  2009.1554876828 6.02     2
#> p_e[3,2]        0.6920613605     1.27838632632    0.0157425488    0.1740860507     4.7775198572 1.19    20
#> p_e[1,3]        0.9596366297     1.61927945156    0.0055685339    0.1741858590     5.7494085202 4.36     2
#> p_e[2,3]      173.1343685322   376.16944150669    0.0001204620    1.5596856287  1268.1170076141 6.02     2
#> p_e[3,3]        0.4649941874     0.83831325783    0.0096883253    0.1167122142     3.2093893698 1.18    21
#> s_c[1]       3032.2358408369   151.61139922695 2751.0897921107 3030.3756706778  3348.3245175302 1.09    23
#> s_c[2]       2900.2780430513   122.23800864867 2670.6775442697 2892.2381888697  3139.2258470041 1.03  1000
#> s_c[3]       3817.2790502038   191.70221737580 3472.1514922716 3798.4022265394  4203.1241447786 1.11    21
#> s_e[1]          0.3795054371     0.01752650093    0.3491170430    0.3786510285     0.4149049768 1.00  1000
#> s_e[2]          0.3485407089     0.01707298529    0.3168235384    0.3477132747     0.3834513214 1.00  1000
#> s_e[3]          0.3271617017     0.01513705029    0.2992196969    0.3264994261     0.3574091839 1.00  1000
#> tau_c[1]        0.0000001096     0.00000001095    0.0000000892    0.0000001089     0.0000001321 1.09    23
#> tau_c[2]        0.0000001195     0.00000001007    0.0000001015    0.0000001195     0.0000001402 1.03  1000
#> tau_c[3]        0.0000000691     0.00000000686    0.0000000566    0.0000000693     0.0000000829 1.11    21
#> tau_e[1]        6.9872541162     0.63805469384    5.8090196013    6.9746391440     8.2046105584 1.00  1000
#> tau_e[2]        8.2907499199     0.80860138077    6.8011057004    8.2709893443     9.9624264864 1.00  1000
#> tau_e[3]        9.4026316121     0.86805169537    7.8283202978    9.3806980431    11.1691376455 1.00  1000
#> tmu_c[1]       69.9787780658    33.70257585769    8.3082827074   68.8925749738   137.2325626584 1.00   510
#> tmu_c[2]       68.4690128478    34.44991287038    3.7765987194   68.0787671060   138.4154810449 1.01   240
#> tmu_c[3]       42.7734793228    35.99748880516  -26.3739312365   42.6236504362   115.1700218174 1.00   330
#> tmu_e[1]        0.5139050437     0.02542791230    0.4648940689    0.5139139375     0.5636534298 1.00  1000
#> tmu_e[2]        0.5595131381     0.02333281507    0.5126808390    0.5594694086     0.6055472449 1.00  1000
#> tmu_e[3]        0.5408858755     0.02167806546    0.5001998168    0.5400838666     0.5837127184 1.00  1000

missingHE standardises the format of the output to report summary quantities related to key model parameters, by default the mean effectiveness and costs at each time point across treatment arms on their natural scale. Estimates of posterior mean (mean), standard deviation (sd) and $95\%$ credible interval boundaries (2.5% and 95.5%) can be used to summarise the posterior distribution of the mean parameters. In addition, the MCMC diagnostics Rhat and n.eff, corresponding to the potential scale reduction factor and effective sample size numeric diagnostics, are provided and can be used to assess convergence for these parameters.

Summary statistics for the regression coefficients on the scale defined by the linear predictor of the effectiveness and cost models $(\boldsymbol \alpha, \boldsymbol \beta)$, as defined in Section 1, may also be displayed using the coef function by typing

> coef(lmdm1_mar)

which returns the following output

#> $Effects
#>                     Mean    SD     QL    QU
#> (Intercept).time.1 0.477 0.034  0.411 0.547
#> trt2.time.1        0.082 0.048 -0.003 0.177
#> (Intercept).time.2 0.497 0.032  0.437 0.558
#> trt2.time.2        0.141 0.047  0.051 0.234
#> (Intercept).time.3 0.482 0.028  0.428 0.540
#> trt2.time.3        0.134 0.043  0.050 0.215
#> 
#> $Costs
#>                      Mean     SD      QL      QU
#> (Intercept).time.1 55.880 30.938  -3.508 118.929
#> trt2.time.1        31.853 32.792 -34.664  94.788
#> (Intercept).time.2 54.103 30.827  -3.999 116.096
#> trt2.time.2        32.457 32.434 -31.262  95.926
#> (Intercept).time.3 33.520 32.970 -28.906  98.342
#> trt2.time.3        20.906 31.369 -42.101  82.585
#> e.time.1           -5.789 31.497 -65.421  57.390
#> e.time.2           -4.822 31.768 -66.864  58.480
#> e.time.3           -1.703 32.187 -64.000  62.265

The regression output is separately reported by outcome. For the above output, the following parameter estimates are reported at each time point in the model: the intercepts $\alpha_{j0}=$(Intercept) in the effectiveness model; the intercepts $\beta_{j0}=$(Intercept) and effectiveness coefficients $\beta_{jf}=$e in the cost model.

For each regression parameter, estimates are summarised using posterior means (mean), standard deviations (sd) and credible interval boundaries (lower and upper), the latter being calculated assuming by default a credible level of $95\%$.

1.6.2 Inspect imputed values

The function plot allows to visually inspect how missing outcome data have been imputed from a model stored in an object of class "missingHE", using ggplot2 as the graphical engine. For example, the command

> plot(lmdm1_mar, class = "boxplot", outcome = "costs", trt = "1", time.plot = 2)

displays boxplots of the observed (black dots and boxes) and imputed (red dots and boxes) effectiveness variables in the "TAU" (trt=1) arm at time $2$ based on the model results stored in lmdm1_mar, as shown in Figure 1.5. Imputed values are represented in terms of posterior mean values, and their uncertainty by means of corresponding credible intervals. The arguments class="boxplot", outcome="costs", trt="1" and time.plot=2 instruct missingHE to display boxplots for the cost outcome in the TAU arm at time $2$.

Figure 1.5: Boxplots of observed and imputed effectiveness data in the TAU arm at time 2 under normal distributions using a longitudinal model.

Note that, similarly to diagnostic, all graphs produced via plot can be manipulated using ggplot2 functions, e.g. to modify the graphics’ aesthetics, text and disposition.

1.6.3 Summarise health economic results

Incremental cost-effectiveness results between the reference arm (trt$=2$) and each of the other study arms (trt$=1$) can be obtained by using the summary function and setting the argument incremental to TRUE.

> summary(lmdm1_mar, incremental = TRUE)

Note that the health economic results displayed by summary are based on aggregated mean effectiveness and cost parameters ($\mu_{e},\mu_{c}$), which are implicitly derived from the model estimates at each time ($\mu_{je},\mu_{jc}$) using Equation (1.4) and Equation (1.5) and the weight quantities (w_e and w_c) specified in lmdm.

The incremental results produced by summary include:

the mean incremental effectiveness and cost parameters ($\Delta_{e}=\mu_{e2}-\mu_{e1},\Delta_{c}=\mu_{c2}-\mu_{c1}$)
the incremental net monetary benefit (INMB) ($\Delta_{e}\times k - \Delta_{c}$), calculated at the given $k$ value (willingness to pay threshold).
the estimated mean of the Incremental Cost-Effectiveness Ratio (ICER), which quantifies the cost per incremental unit of effectiveness, computed based on posterior samples as $\text{ICER}=\frac{\Delta_{c}}{\Delta_{e}}$.

The above output compares the cost-effectiveness of the two arms in the MenSS data set based on the results of the model stored in the object lmdm1_mar. For example, in this case, incremental results suggest that "PBS" is on average associated with higher QALYs and Total costs with respect to "TAU", leading to a positive estimated ICER.

1.6.4 Probabilistic sensitivity analysis

A series of built-in functions are also available from the package BCEA to visually display Probabilistic Sensitivity Analysis (PSA) results derived from a model fitted in missingHE. For example, the following commands may be used

> BCEA::ceplane.plot(lmdm1_mar$cea, graph = "ggplot2")
> BCEA::ceac.plot(lmdm1_mar$cea, graph = "ggplot2")

to generate Figure 1.6 and Figure 1.7, which combines cost-effectiveness plane and acceptability curve plots obtained from the PSA results stored in lmdm1_mar$cea.

Figure 1.6: Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness plane.

Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness acceptability curve.

Figure 1.7: Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness acceptability curve.

In the cost-effectiveness plane, shown in Figure 1.6, all incremental estimates fall in the top-right quadrant (where "PBS" is more expensive and more effective than "TAU") and only a few of them are associated with ICER estimates that are below $k=25,000$ (shaded area). In the cost-effectiveness acceptability curve, shown in Figure 1.7, a positive cost-effectiveness assessment of "PBS" vs "TAU" is shown for only willingness to pay threshold values $k$ above $40,000$.

Longitudinal models in missingHE