The historical analysis framework for trial-based economic evaluations focusses on the modelling of individual-level effectiveness and cost variables \((e_i,c_i)\) collected at a given time or computed over some intervals, i.e. QALYs or Total costs. This because: 1) key quantities of interest, i.e. the mean parameters \((\mu_{e},\mu_{c})\) and related incremental quantities, can be easily derived from a bivariate model fitted to \((e_i,c_i)\); 2) when health economic outcomes are derived from longitudinal data \((e_{ij},c_{ij})\) collected at multiple occasions \(j=1,\ldots,J\), the modelling of \((e_i,c_i)\) is considerably simplified compared to a multivariate model applied to \((e_{ij},c_{ij})\) .
However, when the longitudinal responses are subject to missingness, i.e. \(e_{ij}=(e^{\text{obs}}_{ij},e^{\text{mis}}_{ij})\) and \(c_{ij}=(e^{\text{obs}}_{ij},c^{\text{mis}}_{ij})\), a cross-sectional model is often inadequate since information from partially-observed responses will be ignored in the analysis. As a result, the specification of a joint model for \((e_{ij},c_{ij})\) provides an intuitive approach which, within a Bayesian framework, is a simple extension of the standard cross-sectional framework In addition, the model can be easily customised to: retrieve the estimates of interest while taking into account the evidence from any observed responses; conduct sensitivity analysis to alternative missingness assumptions.
When partially-observed longitudinal data are available, missingHE allows the user to implement a joint longitudinal model through the dedicated function lmdm, which mimics the modelling strategy of a longitudinal selection model. The function requires the specification of a model for the effectiveness (model.eff) and cost (model.cost) variables as well as a conditional model (i.e. mechanism) for the missing effectiveness (model.me) and costs (model.mc). The main difference compared to standard selection models is that, rather than directly modelling the missingness indicators, lmdm specifies a model for the longitudinal missing data pattern indicators \(\boldsymbol r_{ie}=(m_{ije},\ldots,m_{iJe})\) and \(\boldsymbol r_{ic}=(m_{ijc},\ldots,m_{iJc})\), where \(m_{ije}\) and \(m_{ije}\) denote the binary missingness indicators for the responses at time \(j\). As an example, consider a partially-observed effectiveness variable \(e_{ij}\) and let \(\boldsymbol r_{ie}\) denote the corresponding longitudinal missingness patterns indicator. Although \(r_{ie}\) could theory be associated with different values depending on the specific pattern combinations based on the values of \(m_{ije}\), a possible way to simplify the specification of the model is to reduce the number of modelled patterns by aggregating some of the observed patterns together. This is the strategy implemented by missingHE to facilitate the implementation of a longitudinal missingness approach based on a restricted number of aggregated patterns. In particular, missingHE specifies the model using three distinguished types of missingness patterns and assigns each of them with a specific indicator value: 1) \(\boldsymbol r_{ie}=1\) for the complete cases, i.e. \(m_{ije}=0\) for any \(j\); 2) \(\boldsymbol r_{i2}=2\) for intermittent missingness, i.e. \(m_{ije}=1\) and \(m_{ile}=0\) for at least some time \(l>j\); 3) \(\boldsymbol r_{ie}=3\) for dropout, i.e. \(m_{ije}=1\) and \(m_{ile}=0\) for only \(l<j\).
This tutorial shows how it to specify longitudinal models using the dedicated function lmdm in the package, while also showing the available customisation options to allow a flexible modelling specification. Throughout, we will use the built-in dataset called PBS as a toy example, which is directly available when installing and loading missingHE in your R workspace. See Introduction to missingHE for an introductory tutorial of each type of missingness model in missingHE.
If you would like to have more information on the package, or would like to point out potential issues in the current version, feel free to contact the maintainer at a.gabrio@maastrichtuniversity.nl. Suggestions on how to improve the package are also very welcome. A development version of the package is also available from the author’s main GitHub repository at https://github.com/AnGabrio/missingHE.
Trial-based health economic disaggregated data usually consist in the longitudinal cost-effectiveness pair of variables \((e_{ijt},c_{ijt})\), collected on the \(i\)-th patient at time \(j\) assigned to the \(t\)-th treatment in the study, for \(i=1,\ldots,N\) patients, \(j=1,\ldots,J\) time points and \(t=1,\ldots,T\) treatment options. In many jurisdictions the recommended effectiveness variable \(e_{ijt}\) is a Health-Related Quality of Life (HRQoL) measure, typically in the form of utility scores computed from the patients’ answers to generic health-related quality of life questionnaires (e.g. EQ-5D) collected at given time intervals in the study. The cost variable \(c_{ijt}\) denotes the sum of all relevant patient-level costs (e.g. direct/indirect medical costs, productivity losses, etc.), collected from the patients’ answers to different types of healthcare-resource utilisation questionnaires or medical records at different times in the study.
In missingHE, the observed variables \((e_{ijt},c_{ijt})\) are modelled using a joint bivariate probability distribution with density \(p(e_{ijt},c_{ijt}\mid \boldsymbol \theta_{jt})\), as a function of a vector of relevant parameters \(\boldsymbol \theta_{jt}\), and implemented using a series of conditional model factorisations. In particular, at \(j>1\), the model can be represented as:
\[\begin{equation} p(e_{ijt},c_{ijt} \mid \boldsymbol \theta_{jt}) \quad = \quad p(e_{ijt} \mid e_{ij-1t},c_{ij-1t}, \boldsymbol \theta_{ejt})p(c_{ijt} \mid c_{ij-1t}, e_{ij-1t}, e_{ijt}, \boldsymbol \theta_{cjt}), \tag{1.1} \end{equation}\]
where \(p(e_{ijt} \mid e_{ij-1t},c_{ij-1t}, \boldsymbol \theta_{ejt})\) denotes the conditional effectiveness distribution at time \(j\) given past responses (\(e_{ij-1t},c_{ij-1t}\)) and \(p(c_{ijt} \mid c_{ij-1t}, e_{ij-1t}, e_{ijt}, \boldsymbol \theta_{cjt})\) the conditional cost distribution at \(j\) given current (\(e_{ijt}\)) and past responses (\(e_{ij-1t},c_{ij-1t}\)), with \(\theta_{ejt}\) and \(\theta_{cjt}\) being the treatment- and time-specific set of parameters indexing the effectiveness and cost distributions, respectively. To simplify notation, we now drop the treatment index \(t\) from model parameters.
Both outcome densities in Equation (1.1) are specified using some parametric forms for the probability distributions to describe the underlying uncertainty in the cost-effectiveness data. The set of model parameters can be generally represented as \(\boldsymbol \theta_j=(\boldsymbol \phi_j(x),\boldsymbol \psi_j(x))\), where: \(\boldsymbol \phi_j(x)=(\phi_{je}(x),\phi_{jc}(x))\) denotes the set of location parameters (e.g. mean) and \(\boldsymbol \psi_j(x)=(\psi_{je}(x),\psi_{jc}(x))\) the set of ancillary parameters (e.g. variance) indexing the outcome distributions, both depending on some vector of potential baseline covariates \(x\) (e.g. age, sex, etc…). While it is possible for both \(\boldsymbol \phi_j\) and \(\boldsymbol \psi_j\) to explicitly depend on \(x\), the model specification is usually simplified assuming that these only affect the location parameters through a generalised linear formulation. For example, the location parameter of the effectiveness model can be specified as:
\[\begin{equation} g_e(\phi_{ije}) \quad = \quad \alpha_{j0} + \sum_{k=1}^K \alpha_{jk}x_{ik} + \alpha_{je} e_{ij-1} + \alpha_{jc} c_{ij-1} \,\,[+ \ldots], \tag{1.2} \end{equation}\]
and the location parameter of the cost model as:
\[\begin{equation} g_c(\phi_{ijc}) \quad = \quad \beta_{j0} + \sum_{k=1}^K \beta_{jk}x_{ik} + \beta_{jf} e_{ij} + \beta_{je} e_{ij-1} + \beta_{jc} c_{ij-1} , \,\,[+ \ldots]. \tag{1.3} \end{equation}\]
where the regression coefficients \((\alpha_{je},\alpha_{jc})\) and \((\beta_{jf}, \beta_{je},\beta_{jc})\) are included to capture the possible time dependence between \(c_{ij}\) and \(e_{ij}\) at the location level. Note that missingHE, through the argument time_dep inside the lmdm function, allows the user to choose among three alternative specifications of Equation (1.2) and Equation (1.3) based on different assumptions about the response time dependence: 1) "none" requires no time dependence by setting \(\alpha_{je}=\alpha_{jc}=\beta_{je}=\beta_{jc}=\beta_{jf}=0\); 2) "biv" requires dependence only between outcomes at the same time by setting \(\alpha_{je}=\alpha_{jc}=\beta_{je}=\beta_{jc}=0\);l 3) "AR1" requires an autoregressive of order one structure by not imposing any further restrictions on the regression parameters.
Covariates may be included into the model at different scales using the link functions \(g_e(\cdot)\) and \(g_c(\cdot)\), being either identity, logarithmic or logit, depending on the type of outcome and distribution specified. Note that missingHE expands any categorical covariates to a set of dummy variables. In addition, Equation (1.2) and Equation (1.3) can be extended, as indicated by the term \([+ \ldots]\), to include additional terms into the model, e.g. clustering effects to account for the possible multilevel structure of the data. The objective of the statistical analysis is the estimation of population average time-specific effectiveness and cost parameters \(\boldsymbol \mu=(\mu_{je},\mu_{jc})\) in each treatment group and to quantify the (posterior) uncertainty around them. Estimates of these parameters can be obtained from Equation (1.2) and Equation (1.3) either from their respective regression parameters after mean centring all covariates or through repeatedly sampling from the predictive distribution of the outcomes using Monte Carlo integration. Finally, estimates of aggregated mean parameters computed over the entire study period are usually obtained through linear combination of the time-specific average estimates. For instance, in the case of QALYs, aggregated mean response estimates can be typically derived by applying an Area Under the Curve (AUC) approach:
\[\begin{equation} \mu_e = \sum_{j=1}^J \mu_{je} \times w_{je}, \tag{1.4} \end{equation}\]
where \(w_{je}=\frac{\gamma_j}{2}\) denotes the weight used in the AUC calculation, with \(\gamma_j\) being the percentage of the time unit (often \(1\) year) covered between the time interval \(j-1\) and \(j\). As an example, if responses were collected at equally-spaced time intervals, say every \(6\) months, over a one-year study period, then \(w_{je}=\frac{\gamma_j}{2}=\frac{6/12}{2}=0.25\) for any \(j\). In the case of Total costs, aggregated mean response estimates can be typically derived as:
\[\begin{equation} \mu_c = \sum_{j=1}^J \mu_{jc} \times w_{jc}, \tag{1.5} \end{equation}\]
where setting \(w_{jc}=1\) would correspond to a standard unweighted sum. Note that, after missingHE derives estimates for \((\mu_{je},\mu_{jc})\) from Equation (1.2) and Equation (1.3), it automatically computes the aggregated estimates \((\mu_e,\mu_c)\) using the approaches described in Equation (1.4) and Equation (1.5) assuming default values of \(w_{je}=0.5\) and \(w_{jc}=1\) for all \(j\). However, the function lmdm allows the user to modify these values through the optional arguments w_e and w_c, which can be provided either as single values (applied to each \(j\)) or as vectors of length \(J\). This level of customisation allows the user to directly derive aggregated mean cost-effectiveness estimates and use them in the computation of all CEA-specific output when using lmdm.
Assuming that the location parameters are modelled using a linear predictor form as in Equation (1.2) and Equation (1.3), priors on the corresponding sets of regression coefficients can be specified as \(\boldsymbol \alpha_j=(\alpha_{j0},\ldots,\alpha_{jK},\alpha_{je},\alpha_{jc}) \overset{\text{iid}}{\sim} \text{Normal}(\mu_{j\alpha},\sigma_{j\alpha})\) and \(\boldsymbol \beta=(\beta_{j0},\ldots,\beta_{jK},\beta_{jf},\beta_{je},\beta_{jc}) \overset{\text{iid}}{\sim} \text{Normal}(\mu_{j\beta},\sigma_{j\beta})\), where the notation \(\mu_{j\cdot}\) and \(\sigma_{j\cdot}\) denotes the time-specific prior mean and standard deviation, respectively. By default, missingHE assumes the “minimally informative” prior values of \(\mu_{j\cdot}=0\) and \(\sigma_{j\cdot}=1000\) for all regression coefficients. When genuine prior knowledge on specific parameters is available, it is possible to modify the default priors and elicit the corresponding information into the model. As for the ancillary parameters, the choice of the prior values depends on the form of the probability distribution selected according to the modelling choices available in missingHE.
missingHE specifies the model for the aggregated missingness patterns (\(\boldsymbol r_{ie}, \boldsymbol r_{ic}\)) through a selection model approach, where the pattern-specific probabilities \((\pi^{r}_{ie},\pi^{r}_{ic})\) are estimated using a multinomial (log) regression, conditional on possibly fully-observed baseline covariates \(X_i\) and partially-observed outcomes (\(e_{ij},c_{ij}\)). For example, the missingness effectiveness model is specified as
where the upperscript \(r\) indicates the pattern category (1=completers, 2=intermittent, 3=dropout), \(\boldsymbol r_{ie}\) is a vector with entries set to one if \(m^r_{ije}=1\) and zero otherwise, \(\boldsymbol \gamma^r_j = (\gamma^r_{j0e}, \ldots, \gamma^r_{jKe})\) is the set of time- and pattern-specific regression coefficients, while \(\delta^r_{je}\) is the time- and pattern-specific MNAR parameters. Additional terms, e.g. clustering effects, may also be included in the model as indicated by the term \([+ \ldots]\). Note that an equivalent specification to Equation (1.6) for the missingness model of the longitudinal costs can be specified by missingHE through the function lmdm.
According to Equation (1.6), it is clear to see that alternative assumptions about the missingness mechanism can be encoded into the model based on the type of variables included. In particular, we have: a Missing Completely At Random (MCAR) assumption when all regression coefficients, except \(\gamma^r_{j0e}\), are zero; a Missing At Random (MAR) assumption when \(\delta^r_{je}=0\) and one or more coefficients among \((\gamma^r_{j1e}, \ldots, \gamma^r_{jKe})\) is/are not zero; a Missing Not At Random (MNAR) when \(\delta^r_{je} \neq 0\). Under MNAR, due to the missing values in \(e_{ij}\), the model can only be identified by imposing some restrictions. These are implicitly defined from a combination of parametric assumptions about \(p(e_{ij} \mid e_{ij-1}, c_{ij-1}, \boldsymbol \theta_j)\) and dependence structure assumed for \(p(\boldsymbol r_{ie} \mid e_{ij},\boldsymbol\eta_j)\). As a result, sensitivity analysis to assess the impact of alternative missingness assumptions on the inferences can be done by: varying the distributional assumption of \(p(e_{ij} \mid e_{ij-1}, c_{ij-1}, \boldsymbol \theta_j)\); varying the modelling structure in Equation (1.6); and, within a Bayesian setting, varying the prior distribution on \(\delta^r_{je}\).
In the following, we use data from a real clinical study as a running example to illustrate how a longitundal missingness model for trial-based economic evaluations can be specified using the function lmdm from missingHE. The PBS was a multicentre randomised controlled trial conducted on people suffering from intellectual disability and challenging behaviour. A total of \(244\) individuals were enrolled in the trial, \(136\) in the Treatment As Usual (TAU) and \(108\) in the treatment of care plus the Positive Behavioural Support intervention (PBS). For all individuals, key health economic outcomes were collected via self-reported questionnaires (i.e. EQ-5D and CSRI) at baseline, \(6\) and \(12\) months follow-ups.
After loading the package missingHE, for example by typing library(missingHE), the PBS dataset (PBS) is directly imported into the R workspace and can be inspected using the command
> rbind(head(PBS), tail(PBS))
to show the first and last few rows:
#> id time e c age gender ethnicity living carer marital disability site trt
#> 1 1 0.17300001 9214.0 31 0 1 3 0 0 3 7 1
#> 2 1 0.85000002 2492.5 51 0 1 1 1 0 2 7 1
#> 3 1 -0.16599999 15283.0 53 1 1 1 1 0 1 5 1
#> 4 1 0.25800008 1858.0 30 0 1 3 0 0 2 5 1
#> 5 1 0.03800001 797.0 46 0 1 3 1 0 2 5 1
#> 6 1 1.00000000 358.0 44 0 0 3 1 0 1 5 1
#> 239 3 NA NA 33 1 1 2 0 0 3 4 1
#> 240 3 0.81500006 150.0 33 0 0 3 1 0 2 15 1
#> 241 3 0.45200008 1740.0 52 0 0 3 1 0 3 22 2
#> 242 3 0.88300002 105.0 32 0 1 3 1 0 3 23 1
#> 243 3 0.81500006 2372.0 50 0 0 1 1 0 3 2 2
#> 244 3 0.81500006 1630.0 27 1 0 2 0 0 2 2 2
The dataset consists of \(244\) individuals grouped in two arms, where trt\(=1\) indicates the TAU and trt\(=2\) indicates the MenSS intervention, with id and time denoting the participant identification number and measurement occasion, respectively.
Key health economic variables are e and c, which denote the HRQoL utility and cost longitudinal outcomes, respectively. All baseline variables are fully-observed, whereas all outcome variables are partially-observed in both treatment arms. There is a total of 679 (93%) outcome observations, of which 371 in the TAU and 308 in the PBS arm, while the remaining 53 observations are missing. Figure 1.1 and Figure 1.2 show the histograms of the observed distributions for the utility and cost variables in PBS, separately by treatment arm and time point.
Figure 1.1: Histograms of the observed data distributions for the utilities in the TAU and PBS arm by time point.
Figure 1.2: Histograms of the observed data distributions for the costs in the TAU and PBS arm by time point.
missingHE relies on JAGS through the dedicated function lmdm to implement the longitudinal missingness approach while providing a series of customisation options. In particular, key options include: choice of the probability distribution for the effectiveness and cost variables, choice of the regression model for each outcome, and type of missingness mechanism assumed.
A set of pre-defined choices in terms of probability distributions, with built-in parameterisations, for the effectiveness (e) and/or cost (c) variables are available. The user may select a given distributional choice by passing the corresponding missingHE name to the arguments dist_e and dist_c of the desired function. The outcome model \(p(e,c)\) requires a separate specification of the effectiveness and cost model formula using the notation:
where e, c and trt indicate the effectiveness, cost and treatment variable names stored in the data set, while ... is a suitable form for the combination of covariates to be used in each model. Dependence between the two models can be specified by adding the term e as an additional covariate in the cost model and by specifying the type of time dependence structure assumed for the model through the argument time_dep. Currently available choices for this argument are: 1) "none" for complete independence, which also requires e to not be included in model.cost; 2) "biv for bivariate dependence at each time, where dependence between \(c_j\) and \(e_j\) is only allowed at the same time \(j\); 3) "AR1 for order one autoregressive structure as specified in Equation (1.2) and Equation (1.3). Note that the treatment indicator trt must be included into both regression formulae and should be coded as a factor variable in the data set, while the time indicator time should not be included in the formulae and should be coded as a numeric (integer) variable. The type of missingness assumption for each outcome is specified through the argument type, by setting type = "MAR" or type = "MNAR" to select between a MAR or MNAR mechanism, respectively.
As an example, a suitable call to lmdm is the following.
fit.model <- lmdm(data = data, model.eff = model.eff,
model.cost = model.cost, dist_e = dist_e, dist_c = dist_c,
type = type, time_dep = time_dep, ...)where data is a dataframe containing the data, model.eff and model.cost are the formulae for the effectiveness and cost regression models, dist_e, dist_c, type and time_dep are character values indicating the chosen outcome distributions, type of missingness mechanism and time dependence, while ... indicates additional optional arguments. In practice, unless there is a clear understanding of the implications of any change to the default model structure, the user is not required to modify the default values of these optional arguments. The only exception is related to: number of MCMC chains (n.chains), number of iterations (n.iter) and values of the parameters of the prior distributions (prior). As an example, consider the default priors for the linear predictor coefficients in the effectiveness and cost model, namely \(\boldsymbol \beta_j \overset{\text{iid}}{\sim}\text{Normal}(\mu_{j\beta},\sigma_{j\beta})\) and \(\boldsymbol \alpha_j \overset{\text{iid}}{\sim}\text{Normal}(\mu_{j\alpha},\sigma_{j\alpha})\). Suppose the user wants to select different prior mean and standard deviation values for these parameters. This can be done by typing
which instructs missingHE to separately assign normal distributions to each regression coefficient in the models with prior mean of \(0\) and precision (inverse of variance) of \(0.01\). The list myprior can be then passed to the optional prior argument in the chosen model fitting function to implement the model using the updated priors.
fit.model <- lmdm(data = data, model.eff = model.eff,
model.cost = model.cost, dist_e = dist_e, dist_c = dist_c,
type = type, time_dep = time_dep, prior = myprior)Note that the list and its elements containing the custom prior specification need to be named using the missingHE accepted terminology for parameter and distribution names. If the option save_model is set to TRUE, the JAGS model template is saved in the current working directory and can be used to further modify the model structure with a higher degree of flexibility. This, however, will require a new compilation of the model and, at present, the new model has to be run using dedicated commands, such as rjags from the R2jags package.
To ease presentation, we consider the following common model specification: e and c are both normally distributed with no covariates included in their models; a MAR mechanism is assumed for both outcomes, and a bivariate structure is assumed for the time dependence. For demonstrative purposes and to ease presentation of results, in this tutorial we only consider models fitted under MAR. See Fitting MNAR models in missingHE for an overview about how informative missingness models can be fitted in missingHE.
The model is implemented using the function lmdm using the following command.
> lmdm1_mar <- lmdm(data = PBS, dist_e = "norm", dist_c = "norm",
+ model.eff = e ~ trt, model.cost = c ~ trt + e,
+ model.me = me ~ 1, model.mc = mc ~ 1, time_dep = "biv",
+ type = "MAR", n.iter = 1000, ref = 2)
The arguments model.me and model.mc denote the missingness mechanism as in Equation (1.6) for both outcomes. Since no covariates are included in model.me and model.mc, the utility and cost mechanisms are assumed MAR (type="MAR") conditional on the observed outcome values. The argument ref = 2 is used to specify which study arm should be used as the reference group in the comparison, i.e. incremental results are reported here as trt=2 ("PBS") vs trt=1 ("TAU").
Prior to inspecting the posterior results, it is important to assess model performance and convergence of the MCMC algorithm. missingHE provides three functions to compute and implement different types of popular Bayesian model assessment measures and plots, namely diagnostic, ppc and pic.
The function diagnostic provides a range of standard diagnostic tools and plots for assessing convergence based on the posterior distribution of a Bayesian model fitted in missingHE. Consider the output of the model stored in the object lmdm1_mar; we may use the command
> diagnostic(x = lmdm1_mar, type = "denplot", param = "beta.f")
to produce “density plots” for the regression parameters associated with the effectiveness variable within the cost models. In this case, as displayed in Figure 1.3, no major issues in convergence are identified for the parameters monitored at each time point.
Figure 1.3: Checking convergence using the diagnostic function for a model fitted in missingHE, for example through inspection of the density plots.
With the exception of the argument x, which must be an object of class "missingHE", all other arguments can be used to customise the output of the function. A full description of the choices for each argument, including the available types of diagnostic tools and families of parameters, is provided in the help file of the package. Note that the output produced by diagnostic is effectively a ggplot2 object, and can therefore be manipulated by the user using functions from packages which allow to customise such objects.
The function ppc provides a range of graphical posterior predictive checks for assessing the fit of a Bayesian model implemented in missingHE. Consider the output of the model fitted in Section 1.4 and stored in the object lmdm1_mar; we may use the command
> ppc(x = lmdm1_mar, type = "histogram", outcome = "effects",
+ ndisplay = 2, time.plot = 3, trt = "2")
to plot the densities of ndisplay\(=5\) replications of the effectiveness data (light blue lines) with respect to the density of the observed data (dark blue line) in the PBS (trt=2) arm. The argument time.plot indicates for which times the comparison between observed and replicated data should be plotted, here the third time point is selected (default\(=1\)). Figure 1.4 shows some discrepancies between the replicated and observed data histograms, likely due to the relatively poor fit of the current model (i.e. the normality assumption of e) to the observed data in both arms.
Figure 1.4: Posterior predictive graphical checks using the ppc function for a model fitted in missingHE, for example through inspection of histograms.
A full description of the choices for each argument in ppc, including the available types of posterior predictive checks, is provided in the package help file. Note that the arguments in ppc may also be used to generate graphs for any combination of treatment arms and time points available in the data set: for example, graphs corresponding to those shown in Figure 1.4 for only the PBS arm at all time points may be requested by setting trt=2 and time.plot="all".
The function pic allows to assess the relative predictive accuracy of models fitted in missingHE by computing alternative Bayesian information criteria. As an example, consider again the model results stored in the object lmdm1_mar; we may use the command
> pic_lmdm1_mar <- pic(x = lmdm1_mar, criterion = "looic", cases = "cc")
to return a named list containing a series of components used in the calculation of the selected information criterion. In this case, leave-one-out information criterion (LOOIC) was selected as specified in criterion = "looic" based only on the complete cases in the data set (cases = "cc"), and its value may be inspected by typing:
> pic_lmdm1_mar$looic
#> [1] 13821.69
Note that all information criteria in missingHE can only be used as relative measures of fit to compare nested models, with lower values typically indicating better fit. In addition, predictive information criteria are ideally evaluated based on the observed data alone (or after integrating out missingness under MAR). As a result, computing these measures based on the complete cases provides a “safe” choice when comparing the fit of alternative models. pic allows alternative choices for the argument cases, whose appropriateness should be considered by the user based on the context at hand (e.g. amount and type of missing data). Current choices are: complete cases (cc), available cases for the effectiveness (ac_e), available cases for the costs ("ac_c") or all cases (all).
Finally, pic allows to compute information criteria for multiple missingHE objects by simply replacing the argument x with a list of models generated using the functions in the package. For example, suppose an additional longitudinal model was fitted using lmdm and its output stored in an object named lmdm2_mar, which additionally includes some covariates into the effectiveness and cost models. Then, the desired predictive information criteria may be computed for both models using pic by typing:
> pic_compare_mar <- pic(x = list(lmdm1_mar, lmdm2_mar),
+ criterion = "looic", cases = "cc")
A full description of the different types of output associated and input choices is provided in the package help file.
Summary statistics computed based on posterior samples of these parameters, directly generated from JAGS, may be accessed using the print function by typing
> print(lmdm1_mar, display = "fixed")
which returns the following output
#> mean sd 2.5% 50% 97.5% Rhat n.eff
#> alpha[1,1] 0.479036357 0.03161524308 0.4155053803 0.479692861 0.536952205 1.00 1000
#> alpha[2,1] 0.083242197 0.04915640448 -0.0134873589 0.082475213 0.176347853 1.00 1000
#> alpha[1,2] 0.500629511 0.03111389403 0.4398852479 0.501178741 0.561814690 1.01 1000
#> alpha[2,2] 0.138099021 0.04668742614 0.0466183599 0.136988256 0.231867041 1.00 570
#> alpha[1,3] 0.482983342 0.02894591180 0.4252470080 0.483393901 0.539066471 1.00 1000
#> alpha[2,3] 0.134749578 0.04310795848 0.0490197999 0.137105845 0.215386835 1.00 1000
#> beta[1,1] 1491.341914468 173.59726685557 1163.7765846051 1489.946471772 1807.478081275 1.00 960
#> beta[2,1] 1470.189926896 252.69151988144 978.5209579181 1483.345452561 1948.301332310 1.00 1000
#> beta[1,2] 1321.978307574 171.66897030793 956.8780752970 1328.878215939 1651.430537088 1.01 350
#> beta[2,2] 1661.922446956 250.72060063927 1174.9651096041 1660.854672764 2165.190997021 1.01 120
#> beta[1,3] 1344.559680743 278.50277174902 769.9021986938 1336.560144159 1871.984281163 1.00 1000
#> beta[2,3] 1613.590596604 422.30285172185 789.3383987688 1595.596423039 2435.172314586 1.00 630
#> beta_f[1] -1354.480011929 342.29560133008 -2025.9518506555 -1346.760218199 -676.364920990 1.00 1000
#> beta_f[2] -2033.630513655 368.74061603658 -2755.2620790186 -2039.534819059 -1320.127103613 1.01 320
#> beta_f[3] -1383.868158452 635.49234126843 -2614.8325464021 -1388.567855105 -220.351897161 1.00 1000
#> gamma_c[1,1] 3.079287535 2.01446960243 -0.2521444160 2.941012740 7.185389289 1.34 9
#> gamma_c[2,1] 0.340857206 2.55896374991 -3.5173122028 0.268158613 6.262052701 1.17 20
#> gamma_c[3,1] 1.667673038 5.86300607505 -6.4069753941 2.205728319 9.353367873 8.43 2
#> gamma_c[1,2] -0.862376060 2.03880258715 -4.2832256883 -0.997908764 3.453354459 1.33 9
#> gamma_c[2,2] -3.574293221 2.55551744787 -7.5738435505 -3.555746628 2.418401462 1.16 21
#> gamma_c[3,2] -2.261333485 5.90849166768 -10.5230793198 -1.436096937 5.612277051 8.20 2
#> gamma_c[1,3] -0.092420432 2.02257885385 -3.3976207704 -0.254379744 4.140757320 1.34 9
#> gamma_c[2,3] -2.840604044 2.57121626398 -6.7623634605 -2.900618593 3.121589891 1.17 20
#> gamma_c[3,3] -1.526915423 5.85130323019 -9.7285673239 -0.973897974 6.108545575 8.40 2
#> gamma_e[1,1] 3.377705932 4.35175750161 -1.5683559914 2.129554383 9.966619399 9.37 2
#> gamma_e[2,1] 2.976887163 4.77993454884 -3.8660436574 3.301096123 9.957135016 7.58 2
#> gamma_e[3,1] 3.594647329 2.40309400415 -1.5027481679 4.114953366 7.696570172 2.37 3
#> gamma_e[1,2] 1.218248476 4.35906865224 -3.7740114647 0.005335414 7.831208228 9.30 2
#> gamma_e[2,2] 0.825681439 4.78563541544 -6.0784375451 0.882083527 7.904359858 7.52 2
#> gamma_e[3,2] 1.420164201 2.39701048460 -3.6843406099 1.995625290 5.301510019 2.38 3
#> gamma_e[1,3] 0.788617628 4.35730381002 -4.2110023179 -0.584801267 7.439345525 9.06 2
#> gamma_e[2,3] 0.399514829 4.77507371040 -6.5116491010 0.848190901 7.345398365 7.58 2
#> gamma_e[3,3] 1.034584692 2.41480044206 -4.0429349989 1.651564132 4.609838442 2.34 3
#> p_c[1,1] 159.109265102 454.00672763922 0.7771361914 18.935015648 1320.022296892 1.34 9
#> p_c[2,1] 51.430196782 264.03115878183 0.0296791108 1.307576997 524.294063572 1.17 20
#> p_c[3,1] 1502.474418157 2980.49523720907 0.0016500095 32.283382259 11537.642179326 8.43 2
#> p_c[1,2] 3.331215802 10.11725488341 0.0137981926 0.368649569 31.606377804 1.33 9
#> p_c[2,2] 1.017131616 5.65537875873 0.0005137142 0.028560081 11.227912740 1.16 21
#> p_c[3,2] 32.835308133 70.75390853883 0.0000269084 0.455364248 273.766932832 8.20 2
#> p_c[1,3] 6.859611167 20.69163230557 0.0334527743 0.775399644 62.850581875 1.34 9
#> p_c[2,3] 2.012369379 8.73564704497 0.0011564999 0.054990267 22.682620141 1.17 20
#> p_c[3,3] 61.161849216 121.23665374628 0.0000595576 1.186747016 449.685398796 8.40 2
#> p_e[1,1] 2389.860353150 5847.89924666827 0.2083874925 22.955744663 21303.644408127 9.37 2
#> p_e[2,1] 2401.446860648 5941.15323240195 0.0209411242 53.153579048 21102.263563668 7.58 2
#> p_e[3,1] 363.943990759 2014.13203373909 0.2225178109 61.249386442 2202.038213917 2.37 3
#> p_e[1,2] 277.059108923 680.03526427078 0.0229597930 2.450135126 2517.969945501 9.30 2
#> p_e[2,2] 288.311869022 742.16619115266 0.0022917666 4.259877416 2709.068095878 7.52 2
#> p_e[3,2] 40.232759810 223.12397758203 0.0251137355 7.356804531 200.874206954 2.38 3
#> p_e[1,3] 188.896559160 480.21926941422 0.0148315444 1.174277268 1701.652726555 9.06 2
#> p_e[2,3] 177.949945921 431.13064512201 0.0014860271 4.735866756 1549.055334380 7.58 2
#> p_e[3,3] 27.138220584 143.85643872318 0.0175459011 5.215134350 100.722563713 2.34 3
#> s_c[1] 2005.549516309 99.86770973910 1817.9283414252 2005.840414661 2204.208377659 1.01 1000
#> s_c[2] 1885.453584180 89.41612387296 1740.7123477618 1886.681443893 2083.566551770 1.00 1000
#> s_c[3] 3128.965686024 129.41268049428 2879.4566428235 3126.486883197 3360.043730853 1.06 35
#> s_e[1] 0.378099494 0.01773177438 0.3426615996 0.377988791 0.415865803 1.01 510
#> s_e[2] 0.346951693 0.01610870169 0.3179493226 0.346128703 0.380630046 1.00 910
#> s_e[3] 0.327435345 0.01444183800 0.3001133073 0.327218782 0.358020862 1.00 1000
#> tau_c[1] 0.000000250 0.00000002504 0.0000002058 0.000000249 0.000000303 1.01 1000
#> tau_c[2] 0.000000283 0.00000002638 0.0000002303 0.000000281 0.000000330 1.00 1000
#> tau_c[3] 0.000000103 0.00000000857 0.0000000886 0.000000102 0.000000121 1.06 35
#> tau_e[1] 7.041389629 0.66493504691 5.7822064241 6.999099324 8.516656447 1.01 510
#> tau_e[2] 8.360675805 0.77021226921 6.9023005523 8.346891548 9.892001938 1.00 910
#> tau_e[3] 9.381486971 0.82589393622 7.8015941043 9.339498594 11.102722733 1.00 1000
#> tmu_c[1] 2142.081718176 135.17242095793 1870.7574535877 2144.785257161 2398.523536092 1.00 1000
#> tmu_c[2] 2057.583325079 129.12937294528 1815.8210378641 2055.645621733 2296.937114008 1.00 1000
#> tmu_c[3] 2058.771912026 203.30310975334 1675.9600946416 2060.591713755 2448.105164427 1.00 1000
#> tmu_e[1] 0.515881264 0.02384157073 0.4696352801 0.515419527 0.562815108 1.00 1000
#> tmu_e[2] 0.561755307 0.02315990295 0.5170152558 0.561861979 0.605781669 1.00 1000
#> tmu_e[3] 0.542626598 0.02184934858 0.4995796059 0.542034631 0.587003209 1.00 1000
missingHE standardises the format of the output to report summary quantities related to key model parameters, by default the mean effectiveness and costs at each time point across treatment arms on their natural scale. Estimates of posterior mean (mean), standard deviation (sd) and \(95\%\) credible interval boundaries (2.5% and 95.5%) can be used to summarise the posterior distribution of the mean parameters. In addition, the MCMC diagnostics Rhat and n.eff, corresponding to the potential scale reduction factor and effective sample size numeric diagnostics, are provided and can be used to assess convergence for these parameters.
Summary statistics for the regression coefficients on the scale defined by the linear predictor of the effectiveness and cost models \((\boldsymbol \alpha, \boldsymbol \beta)\), as defined in Section 1, may also be displayed using the coef function by typing
> coef(lmdm1_mar)
which returns the following output
#> $Effects
#> Mean SD QL QU
#> (Intercept).time.1 0.479 0.032 0.416 0.537
#> trt2.time.1 0.083 0.049 -0.013 0.176
#> (Intercept).time.2 0.501 0.031 0.440 0.562
#> trt2.time.2 0.138 0.047 0.047 0.232
#> (Intercept).time.3 0.483 0.029 0.425 0.539
#> trt2.time.3 0.135 0.043 0.049 0.215
#>
#> $Costs
#> Mean SD QL QU
#> (Intercept).time.1 1491.342 173.597 1163.777 1807.478
#> trt2.time.1 1470.190 252.692 978.521 1948.301
#> (Intercept).time.2 1321.978 171.669 956.878 1651.431
#> trt2.time.2 1661.922 250.721 1174.965 2165.191
#> (Intercept).time.3 1344.560 278.503 769.902 1871.984
#> trt2.time.3 1613.591 422.303 789.338 2435.172
#> e.time.1 -1354.480 342.296 -2025.952 -676.365
#> e.time.2 -2033.631 368.741 -2755.262 -1320.127
#> e.time.3 -1383.868 635.492 -2614.833 -220.352
The regression output is separately reported by outcome. For the above output, the following parameter estimates are reported at each time point in the model: the intercepts \(\alpha_{j0}=\)(Intercept) in the effectiveness model; the intercepts \(\beta_{j0}=\)(Intercept) and effectiveness coefficients \(\beta_{jf}=\)e in the cost model.
For each regression parameter, estimates are summarised using posterior means (mean), standard deviations (sd) and credible interval boundaries (lower and upper), the latter being calculated assuming by default a credible level of \(95\%\).
The function plot allows to visually inspect how missing outcome data have been imputed from a model stored in an object of class "missingHE", using ggplot2 as the graphical engine. For example, the command
> plot(lmdm1_mar, class = "boxplot", outcome = "costs", trt = "1", time.plot = 2)
displays boxplots of the observed (black dots and boxes) and imputed (red dots and boxes) effectiveness variables in the "TAU" (trt=1) arm at time \(2\) based on the model results stored in lmdm1_mar, as shown in Figure 1.5. Imputed values are represented in terms of posterior mean values, and their uncertainty by means of corresponding credible intervals. The arguments class="boxplot", outcome="costs", trt="1" and time.plot=2 instruct missingHE to display boxplots for the cost outcome in the TAU arm at time \(2\).
Figure 1.5: Boxplots of observed and imputed effectiveness data in the TAU arm at time 2 under normal distributions using a longitudinal model.
Note that, similarly to diagnostic, all graphs produced via plot can be manipulated using ggplot2 functions, e.g. to modify the graphics’ aesthetics, text and disposition.
Incremental cost-effectiveness results between the reference arm (trt\(=2\)) and each of the other study arms (trt\(=1\)) can be obtained by using the summary function and setting the argument incremental to TRUE.
> summary(lmdm1_mar, incremental = TRUE)
Note that the health economic results displayed by summary are based on aggregated mean effectiveness and cost parameters (\(\mu_{e},\mu_{c}\)), which are implicitly derived from the model estimates at each time (\(\mu_{je},\mu_{jc}\)) using Equation (1.4) and Equation (1.5) and the weight quantities (w_e and w_c) specified in lmdm.
The incremental results produced by summary include:
the mean incremental effectiveness and cost parameters (\(\Delta_{e}=\mu_{e2}-\mu_{e1},\Delta_{c}=\mu_{c2}-\mu_{c1}\))
the incremental net monetary benefit (INMB) (\(\Delta_{e}\times k - \Delta_{c}\)), calculated at the given \(k\) value (willingness to pay threshold).
the estimated mean of the Incremental Cost-Effectiveness Ratio (ICER), which quantifies the cost per incremental unit of effectiveness, computed based on posterior samples as \(\text{ICER}=\frac{\Delta_{c}}{\Delta_{e}}\).
The above output compares the cost-effectiveness of the two arms in the MenSS data set based on the results of the model stored in the object lmdm1_mar. For example, in this case, incremental results suggest that "PBS" is on average associated with higher QALYs and Total costs with respect to "TAU", leading to a positive estimated ICER.
A series of built-in functions are also available from the package BCEA to visually display Probabilistic Sensitivity Analysis (PSA) results derived from a model fitted in missingHE. For example, the following commands may be used
> BCEA::ceplane.plot(lmdm1_mar$cea, graph = "ggplot2")
> BCEA::ceac.plot(lmdm1_mar$cea, graph = "ggplot2")
to generate Figure 1.6 and Figure 1.7, which combines cost-effectiveness plane and acceptability curve plots obtained from the PSA results stored in lmdm1_mar$cea.
Figure 1.6: Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness plane.
Figure 1.7: Inspecting probabilistic sensitivity analysis using BCEA built-in functions, for example in terms of cost-effectiveness acceptability curve.
In the cost-effectiveness plane, shown in Figure 1.6, all incremental estimates fall in the top-right quadrant (where "PBS" is more expensive and more effective than "TAU") and only a few of them are associated with ICER estimates that are below \(k=25,000\) (shaded area). In the cost-effectiveness acceptability curve, shown in Figure 1.7, a positive cost-effectiveness assessment of "PBS" vs "TAU" is shown for only willingness to pay threshold values \(k\) above \(40,000\).