AdhereR: Adherence to Medications

Alexandra L. Dima, Dan Dediu & Samuel Allemann

2022-07-05

Estimating the adherence to medications from electronic healthcare data (EHD) has been central to research and clinical practice across clinical conditions. For example, large retrospective database studies may estimate the prevalence of non-adherence in specific patient groups and model its potential predictors and impact on health outcomes, while clinicians may have access to individual patient records that flag up possible non-adherence for further clinical investigation and intervention. Yet, adherence measurement is a matter of controversy. Many methodological studies show that the same data can generate different prevalence estimates under different parametrisations (Greevy et al., 2011; Gardarsdottir et al., 2010; Souverein et al., in press; Vollmer et al., 2012; Van Wijk et al., 2006).

These parametrisation choices are usually not transparently reported in published empirical studies, and adherence algorithms are either developed ad-hoc or for proprietary software. This lack of transparency and standardization has been one of the main methodological barriers against the development of a solid evidence base on adherence from EHD, and may lead to misinformed clinical decisions.

Here we describe AdhereR, an R package that aims to facilitate the computing of adherence from EHD, as well as the transparent reporting of the chosen calculations. It contains a set of R S3 classes and functions that compute, summarize and plot various estimates of adherence. A hypothetical dataset of medication events is included for demonstration and testing purposes. In this vignette, we start by defining the terms used in AdhereR. We then use medication records of two patients in the included dataset to illustrate the various decisions required and their impact on estimates, starting with the visualization of medication events, computation of persistence (treatment episode length), and computation of adherence (9 functions available in 3 versions: simple, per-treatment-episode, and sliding-window). Visualizations for each function are illustrated, and the interactive visualization function is presented.

While we tested the package relatively extensively, we cannot guarantee that bugs and errors do not exist, and we encourage the users to contact us with suggestions, bug reports, comments (or even just to share their experiences using the package) either by e-mail (to Dan ddediu@gmail.com or Alexandra alexadima@gmail.com) or using GitHub’s reporting mechanism at our repository https://github.com/ddediu/AdhereR, which contains the full source code of the package (including this vignette).

Definitions

Adherence to medications is described as a process consisting of 3 successive elements/stages: initiation, implementation, and discontinuation (Vrijens et al., 2012). After initiating treatment (first medication intake), a patient would continue with implementing the regimen for a time period, ideally equal to the recommended time necessary for achieving therapeutic benefits; the quality of implementation is commonly labelled adherence and broadly operationalized as a ratio of medication quantity used versus prescribed in a time period. If patients discontinue medication earlier than the recommended time period, the period before discontinuation is described as persistence, in contrast to the following period of non-persistence.

The ideal measurement of this process would record the prescription moment and every medication intake with an exact time-stamp. This would allow, for example, to describe adherence to a twice-daily medication prescribed for 1 year in maximum detail: how long was the delay between prescription and the moment of the first medication intake, whether each of the two prescribed administrations per day corresponded to an intake event and at what time, how much medication was taken versus prescribed on any time interval while the patient persisted with treatment, any specific implementation patterns (e.g. missing or delaying the first daily dose), and when exactly the last medication intake event took place during that year. While this level of detail can be obtained by careful use of electronic monitoring devices, electronic healthcare data usually include much less information.

Administrative claims or pharmacy databases record medication dispensation events, including patient identifier, date of event, type of medication, and quantity dispensed, and less frequently daily dosage recommended. The same information may be available for prescription events in electronic medical records used in health care organizations (e.g primary care practices, secondary care centers). In between two dispensing or prescribing events, we don’t know whether and how medication has been used. What we do know is that, if taken as prescribed, the medication supplied at the first event would have lasted a number of days. If the time interval between the two events is longer than this number it is likely that the patient ran out of medication before re-supplying or used less during that time. If the interval is substantially longer or there is no second event, then the patient has probably finished the supply at some point and then discontinued medication. Thus, EHD-based algorithms estimate medication adherence and persistence based on the availability of current supply, under four main assumptions:

(Several other assumptions apply to individual algorithms, and will be discussed later.)

AdhereR was developed to compute adherence and persistence estimates from EHD based on the principles described above. The current version is based on a single data source, therefore an algorithm for initiation (time interval between first prescription and first dispensing event) is not implemented (it is a time difference calculation easy to implement in with basic R functions). The following terms and definitions are used:

Data preparation and example dataset

AdhereR requires a dataset of medication events over a FUW of sufficient length in relation to the recommended treatment duration. To our knowledge, no research has been performed to date on the relationship between FUW length and recommended treatment duration. AdhereR offers the opportunity for answering such methodological questions, but we would hypothesize that the FUW duration also depends on the duration of medication events (shorter durations would allow shorter FUW windows to be informative).

The minimum necessary dataset includes 3 variables for each medication event: patient unique identifier, event date, and duration. Daily dosage and medication type are optional.AdhereR is thus designed to use datasets that have already been extracted from EHD and prepared for calculation. These preliminary steps depend to a large extent on the specific database used and the type of medication and research design. Several general guidelines can be consulted (Arnet et al., 2016; Peterson et al., 2007), as well as database-specific documentation. In essence, these steps should entail:

For demonstration purposes, we included in AdhereR a hypothetical dataset of 1080 medication events from 100 patients in a 2-year FUW. Five variables are included in this dataset:

Table 1 shows the medication events of two example patients: 19 medication events related to two medication types (medA and medB). They were selected to represent two different medication histories. Patient 37 had a stable daily dosage but event duration changes with medication change. Patient 76 had a more variable pattern, including medication, daily dosage and duration changes.

# Load the AdhereR library:
library(AdhereR);

# Select the two patients with IDs 37 and 76 from the built-in dataset "med.events":
ExamplePats <- med.events[med.events$PATIENT_ID %in% c(37, 76), ];
# Display them as pretty markdown table:
knitr::kable(ExamplePats, caption = "<a name=\"Table-1\"></a>**Table 1.** Medication events for two example patients");
Table 1. Medication events for two example patients
PATIENT_ID DATE PERDAY CATEGORY DURATION
14 37 04/10/2036 4 medA 50
15 37 07/30/2036 4 medA 50
16 37 09/15/2036 4 medA 50
17 37 01/02/2037 4 medB 30
18 37 01/31/2037 4 medB 30
19 37 05/09/2037 4 medB 30
20 37 08/13/2037 4 medB 30
21 37 11/09/2037 4 medB 30
813 76 12/13/2035 20 medA 30
814 76 01/18/2036 20 medA 30
815 76 01/23/2036 2 medA 60
816 76 04/25/2036 2 medA 60
817 76 08/08/2036 2 medA 60
818 76 10/03/2036 2 medA 60
819 76 11/29/2036 2 medA 60
820 76 12/21/2036 6 medB 30
821 76 01/05/2037 6 medB 30
822 76 07/13/2037 6 medB 30
823 76 10/11/2037 2 medA 30

Visualization of patient records

A first step towards deciding which algorithm is appropriate for these data is to explore medication histories visually. We do this by creating an object of type CMA0 for the two example patients, and plotting it. This type of plots can of course be created for a much bigger subsample of patients and saved as as a JPEG, PNG, TIFF, EPS or PDF file using R’s plotting system for data exploration.

# Create an object "cma0" of the most basic CMA type, "CMA0":
cma0 <- CMA0(data=ExamplePats, # use the two selected patients
             ID.colname="PATIENT_ID", # the name of the column containing the IDs
             event.date.colname="DATE", # the name of the column containing the event date
             event.duration.colname="DURATION", # the name of the column containing the duration
             event.daily.dose.colname="PERDAY", # the name of the column containing the dosage
             medication.class.colname="CATEGORY", # the name of the column containing the category
             followup.window.start=0,  # FUW start in days since earliest event
             observation.window.start=182, # OW start in days since earliest event
             observation.window.duration=365, # OW duration in days
             date.format="%m/%d/%Y"); # date format (mm/dd/yyyy)
# Plot the object (CMA0 shows the actual event data only):
plot(cma0, # the object to plot
     align.all.patients=TRUE); # align all patients for easier comparison
<a name="Figure-1"></a>**Figure 1.** Medication histories - two example patients

Figure 1. Medication histories - two example patients

We can see that patient 76 had an interruption of more than 100 days between the second and third medB supply and several situations of new supply acquired while the previous supply was not exhausted. Patient 37 had shorter gaps between consecutive events, but very little overlap in supplies. For patient 76, the switch to medB happened while the medA supply was still available, then a switch back to medA happened later, at the end of the second year. For patient 37, there was a single medication switch (to medB) without an overlap at that point.

Sometimes it is useful to also see the daily dose:

# Same plot as above but also showing the daily doses:
plot(cma0, # the object to plot
     print.dose=TRUE, plot.dose=TRUE,
     align.all.patients=TRUE); # align all patients for easier comparison
<a name="Figure-1a"></a>**Figure 1 (a).** Same plot as in **Figure 1**, but also showing the daily doses as numbers and as line thickness

Figure 1 (a). Same plot as in Figure 1, but also showing the daily doses as numbers and as line thickness

These observations highlight several decision points in calculating persistence and adherence, which need to be informed by the clinical context of the study:

These decisions therefore need to be taken based on a good understanding of the pharmacological properties of the medication studied, and the most plausible clinical decision-making in routine care. This information can be collected from an advisory committee with relevant expertise (e.g. based on consensus protocols), or (even better) qualitative or survey research on the routine practices in prescribing, dispensing and using that specific medication. Of course, this is not always possible – a second-best option (or even complementary option, if consensus is not reached) is to compare systematically the effects of different analysis choices on the hypotheses tested (e.g. as sensitivity analyses).

Persistence – treatment episodes

An essential first decision is to distinguish between persistence with treatment and quality of implementation (once the patient started treatment – which, as explained above, is assumed in situations when we have only one data source of prescribing or dispensing events). The function compute.treatment.episodes() was developed for this purpose. We provide below an example of how this function can be used.

Let’s imagine that medA and medB are two different types of medication, and clinicians in our advisory committee agree that whenever a health care professional changes the type of medication supplied this should be considered as a new treatment episode; we will specify this as setting the parameter medication.change.means.new.treatment.episode to TRUE.

They also agree that a minumum of 6 months (180 days) need to pass after the end of a medication supply (taken as prescribed) without receiving a new supply in order to be reasonably confident that the patient has discontinued/interrupted the treatment – they can conclude this for example based on an approximate calculation considering that specific medication is usually supplied for 1-2 months, daily dosage is usually 2 to 4 pills a day, and patients often use as low as 1/4 of the recommended dose in a given interval. We will specify this as maximum.permissible.gap = 180, and maximum.permissible.gap.unit = "days". (If in another scenario the clinical information we obtain suggests that the permissible gap should depend on the duration of the last supply, for example 6 times that interval should go by before a discontinuation becoming likely, we can specify this as maximum.permissible.gap = 600, and maximum.permissible.gap.unit = "percent".)

We might also have some clinical confirmation that usually people finish existing supply before starting the new one (carryover.within.obs.window = TRUE), but of course only for the same medication if medA and medB are supplied with a recommendation to start a new treatment immediately (carry.only.for.same.medication = TRUE), take the existing supply based on the new dosage recommendations if these change (consider.dosage.change = TRUE).

The rest of the parameters specify the name of the dataset (here ExamplePats), names of the variables in the dataset (here based on the demo dataset, described above), and the FUW (here the whole 2-year window).

# Compute the treatment episodes for the two patients:
TEs3<- compute.treatment.episodes(ExamplePats,
                                  ID.colname="PATIENT_ID",
                                  event.date.colname="DATE",
                                  event.duration.colname="DURATION",
                                  event.daily.dose.colname="PERDAY",
                                  medication.class.colname="CATEGORY",
                                  carryover.within.obs.window = TRUE, # carry-over into the OW
                                  carry.only.for.same.medication = TRUE, # & only for same type
                                  consider.dosage.change = TRUE, # dosage change starts new episode...
                                  medication.change.means.new.treatment.episode = TRUE, # & type change
                                  maximum.permissible.gap = 180, # & a gap longer than 180 days
                                  maximum.permissible.gap.unit = "days", # unit for the above (days)
                                  followup.window.start = 0, # 2-years FUW starts at earliest event
                                  followup.window.start.unit = "days",
                                  followup.window.duration = 365 * 2,
                                  followup.window.duration.unit = "days",
                                  date.format = "%m/%d/%Y");
knitr::kable(TEs3, 
             caption = "<a name=\"Table-2\"></a>**Table 2.** Example output `compute.treatment.episodes()` function");
Table 2. Example output compute.treatment.episodes() function
PATIENT_ID episode.ID episode.start end.episode.gap.days episode.duration episode.end
37 1 2036-04-10 56 211 2036-11-07
37 2 2037-01-02 122 463 2038-04-10
76 1 2035-12-13 0 374 2036-12-21
76 2 2036-12-21 60 234 2037-08-12
76 3 2037-10-11 32 62 2037-12-12

The function produces a dataset as the one shown in Table 2. It includes each treatment episode for each patient (here 2 episodes for patient 37 and 3 for patient 76) and records the patient ID, episode number, date of episode start, gap days at the end of or after the treatment episode, duration of episode, and episode end date:

Notes:

  1. just the number of gap days after the end of the episode can be computed by keeping all values larger than the permissible gap and by replacing the others by 0,
  2. when medication change represents a new treatment episode, the previous episode ends when the last supply is finished (irrespective of the length of gap compared to a maximum permissible gap); any days before the date of the new medication supply are considered a gap. This maintains consistence with the computation of gaps between episodes (whether they are constructed based on the maximum permissible gap rule or the medication change rule).

In some scenarios, it is important to imagine that the episodes also include the maximum permissible gap when the gap is larger than this maximum (i.e., not when a new episode is due to a change in treatment or dose before this maximum permissible gap was exceeded). To allow such scenarios, compute.treatment.episodes() has a logical parameter, maximum.permissible.gap.append.to.episode, which, when set to TRUE appends the maximum permissible gap at the end of the episodes (its default value FALSE means that the episodes end as described above).

This output can be used on its own to study causes and consequences of medication persistence (e.g. by using episode duration in time-to-event analyses). This function is also a basis for the CMA_per_episode class, which is described later in the vignette.

Adherence – continuous multiple interval measures of medication availability/gaps (CMA)

Let’s consider another scenario: medA and medB are alternative formulations of the same chemical molecule, and clinicians agree that they can be used by patients within the same treatment episode. In this case, both patients had a single treatment episode for the whole duration of the follow-up (Table 3). We can therefore compute adherence for any observation window (OW) within these 2 years without any concern that we might confuse quality of implementation with (non-)persistence.

# Compute the treatment episodes for the two patients
# but now a change in medication type does not start a new episode:
TEs4<- compute.treatment.episodes(ExamplePats,
                                  ID.colname="PATIENT_ID",
                                  event.date.colname="DATE",
                                  event.duration.colname="DURATION",
                                  event.daily.dose.colname="PERDAY",
                                  medication.class.colname="CATEGORY",
                                  carryover.within.obs.window = TRUE, 
                                  carry.only.for.same.medication = TRUE,
                                  consider.dosage.change = TRUE,
                                  medication.change.means.new.treatment.episode = FALSE, # here
                                  maximum.permissible.gap = 180,
                                  maximum.permissible.gap.unit = "days",
                                  followup.window.start = 0,
                                  followup.window.start.unit = "days",
                                  followup.window.duration = 365 * 2,
                                  followup.window.duration.unit = "days",
                                  date.format = "%m/%d/%Y");
# Pretty print the events:
knitr::kable(TEs4, 
             caption = "<a name=\"Table-3\"></a>**Table 3.** Alternative scenario output `compute.treatment.episodes()` function");
Table 3. Alternative scenario output compute.treatment.episodes() function
PATIENT_ID episode.ID episode.start end.episode.gap.days episode.duration episode.end
37 1 2036-04-10 122 730 2038-04-10
76 1 2035-12-13 32 730 2037-12-12

Once we clarified that we indeed measure quality of implementation and not (non)-persistence, several CMA classes can be used to compute this specific component of adherence. We will discuss first in turn the simple CMA classes, then present the more complex (or iterated) CMA_per_episode and CMA_sliding_window ones.

The simple CMAs

A first decision to consider when calculating the quality of implementation is what is the appropriate observation window – when it should start and how long it should last? We can see for example that patient 76 had some periods of regular (even overlapping) supplies, and periods when there were some large delays between consecutive medication events. Thus, estimating adherence for a whole 2-year period might be too coarse-grained to mean anything for how patients actually managed their treatment at any particular moment. As mentioned earlier in the Definitions section, EHD don’t have good granularity to start with, so we need to do the best with what we’ve got – and compressing all this information into a single estimate might not be the best solution, at least not the obvious first choice. On the other hand, due to the low granularity, we cannot target very short observation windows either because we simply don’t know what happened every day. This decision needs to be informed again by information collected from the advisory committee or qualitative/quantitative studies in the target population. It also needs to take into account the average duration of medication supply from one event, and the average time interval between two events – which can be examined in exploratory plots (Figure 1) – and the research question and design of the study. For example, if we expect that the quality of implementation reduces in time from the start of a treatment episode, medication is usually supplied for one month, and patients can take up to 4 times as much to use up their supplies, we might want to consider comparing successive 4-month OWs. If we want to examine quality of implementation 6 months before a clinical event (on the clinical assumption that how a patient takes medication in previous 6 months may impact on the probability of a health event occurring or not), we might want to consider an OW start 6 months before the event, and a 6-month duration. The posibilities here are endless, and research on the impact of different analysis choices on substantive results is still scarce. When the consensus is not reached based on the available information, one or more parametrisations can be compared – and formulated as research questions.

For demonstration purposes, let’s imagine a scenario when an adherence intervention takes place 6 months (182 days) after the start of the treatment episode, and we hypothesize that it will improve the quality of implementation in the next year (365 days) in the intervention group compared to the control group. We can specify this as followup.window.start=0, observation.window.start=182, and observation.window.duration=365 (we can of course divide this interval into shorter windows and compare the two groups in terms of longitudinal changes in adherence, as we shall see later, but for the moment let’s stick to a global 1-year estimate). We have 9 CMA classes that can produce very different estimates of the quality of implementation, the first eight have been described by Vollmer and colleagues (2012) as applied to randomized controlled trials. We implemented them in AdhereR based on the authors’ description, and in essence are defined by 4 parameters:

  1. how is the OW delimited (whether time intervals before the first event and after the last event are considered),
  2. whether CMA values are capped at 100%,
  3. whether medication oversupply is carried over to the next event interval, and
  4. whether medication available before a first event is considered in supply calculations or OW definition.

CMA1

CMA1 is the simplest method, often described in the literature as the medication possession ratio (MPR). It simply adds up the duration of all medication events within the OW, excluding the last event, and divides this by the number of days between the first and last event (multiplied by 100 to obtain a percentage). Thus, it can be higher than 1 (or 100% adherence) and, if the OW does not start and end with a medication event for all patients, it can actually refer to different lengths of time within the OW for different patients. For example, for patient 76 below CMA1 is computed for the period starting with the first event in the highlighted interval and ending at the date if the last event – thus, it considers only 4 events with considerable overlaps and results in a CMA1 of 140%, indicating overuse.

Creating an object of class CMA1 with various parameters automatically performs the estimation of CMA1 for all the patients in the dataset; moreover, the object is smart enough to allow the appropriate printing and plotting. The object includes all the parameter values with which it was created, as well as the CMA data.frame, which is the main result, with two columns: patient ID and the corresponding CMA estimate. The CMA estimates appear as ratios, but can be trivially transformed into percentages and rounded, as we did for patient 76 below (rounded to 2 decimals). The plots show the CMA as percentage rounded to 1 decimal.

# Create the CMA1 object with the given parameters:
cma1 <- CMA1(data=ExamplePats,
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
# Display the summary:
cma1
## CMA1:
##   "The ratio of days with medication available in the observation window excluding the last event; durations of all events added up and divided by number of days from first to last event, possibly resulting in a value >1.0"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     medication.groups = <NONE>
##     flatten.medication.groups = FALSE
##     medication.groups.colname = .MED_GROUP_ID
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.start.per.medication.group = FALSE
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 182
##     observation.window.start.unit = days
##     observation.window.duration = 365
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     CMA = CMA results for 2 patients
##   ]
##   DATA: 19 (rows) x 5 (columns) [2 patients].
# Display the estimated CMA table:
cma1$CMA
##   PATIENT_ID       CMA
## 1         37 0.4035874
## 2         76 1.4000000
# and equivalently using an accessor function:
getCMA(cma1);
##   PATIENT_ID       CMA
## 1         37 0.4035874
## 2         76 1.4000000
# Compute the CMA value for patient 76, as percentage rounded at 2 digits:
round(cma1$CMA[cma1$CMA$PATIENT_ID== 76, 2]*100, 2)
## [1] 140
# Plot the CMA:
# The legend shows the actual duration, the days covered and gap days, 
# the drug (medication) type, the FUW and OW, and the estimated CMA.
plot(cma1, 
     patients.to.plot=c("76"), # plot only patient 76 
     legend.x=520); # place the legend in a nice way
<a name="Figure-2"></a>**Figure 2.** Simple CMA 1

Figure 2. Simple CMA 1

CMA2

Thus, CMA1 assumes that there is a treatment episode within the OW (shorter or equal to the OW) when the patient used the medication, and every new medication event happened when the previous supply finished (possibly due to overuse). These assumptions rarely fit with real life use patterns. One limitation is not considering the last event – which represents almost a half of the OW in the case of patient 76.

To address this limitation, CMA2 includes the duration of the last event in the numerator and the period from the last event to the end of the OW in the denominator. Thus, the estimate Figure 3 is 77.9%, more in line with the medication history of this patient in the year after the intervention.

cma2 <- CMA2(data=ExamplePats, # we're estimating CMA2 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma2, 
     patients.to.plot=c("76"),  
     show.legend=FALSE); # don't show legend to avoid clutter (see above)
<a name="Figure-3"></a>**Figure 3.** Simple CMA 2

Figure 3. Simple CMA 2

CMA3 and CMA4

Both CMA1 and CMA2 can be higher that 1 (100% adherence) based on the assumption that medication supply is finished until the last event (CMA1) or the end of the OW (CMA2). But sometimes this is not plausible, because patients can refill their supply earlier (for example when going on holidays) and overuse is a less frequent behaviour for some medications (when side effects are considerable for overuse, or medications are expensive). Or it may be that it does not matter whether patients use 100% or more that 100% of their medication, the therapeutic effect is the same with no risks or side effects. Again, this is a matter of inquiry to the advisory committee or investigation in the target population.

If it is likely that implementation does not exceed 100% (or does not make a difference if it does), CMA3 and CMA4 below adjust for this by capping CMA1 and CMA2 respectively to 100%. As shown in Figures 4 and 5, CMA3 is now capped at 100%, and CMA4 remains the same as CMA2 (because it was already lower than 100%).

cma3 <- CMA3(data=ExamplePats, # we're estimating CMA3 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma3, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-4"></a>**Figure 4.** Simple CMA 3

Figure 4. Simple CMA 3

cma4 <- CMA4(data=ExamplePats, # we're estimating CMA4 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma4,patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-5"></a>**Figure 5.** Simple CMA 4

Figure 5. Simple CMA 4

CMA5 and CMA6

All CMAs from 1 to 4 have a major limitation: they don’t take into account the timing of the events. But if there is a large gap between two events it is more likely that the person had used the medication less than prescribed at least in part of that interval. Just capping the values as in CMA3 and CMA4 does not account for that likely reduction in adherence – two patients with the same quantity of supply will have the same percentage of adherence even if one has had substantial delays in supply at some points and the other supplied in time.

To adjust for this, CMA5 and CMA6 provide alternative calculations to CMA1 and CMA2 respectively. Thus, we instead calculate the number of gap days, extract it from the total time interval, and divide this value by the total time interval (first to last event in CMA5, and first event to end of OW in CMA6). By considering the gaps, we now need to decide whether to control for how any remaining supply is used when a new supply is obtained. Two additional parameters are included here: carry.only.for.same.medication and consider.dosage.change. Both are set here as FALSE, to specify the fact that carry over should always happen irrespective of what medication is supplied, and that the duration of the remaining supply should be modified if the dosage recommendations are changed with a new medication event. As shown in Figures 6 and 7, these alternative calculations do not make any difference for patient 76, because there are no gaps between the 5 events in the OW highighted. There could be, however, situations in which large gaps between some events in the OW result in lower CMA estimates when considering timing of events.

cma5 <- CMA5(data=ExamplePats, # we're estimating CMA5 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE, # carry-over across medication types
             consider.dosage.change=FALSE, # don't consider canges in dosage
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma5,patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-6"></a>**Figure 6.** Simple CMA 5

Figure 6. Simple CMA 5

cma6 <- CMA6(data=ExamplePats, # we're estimating CMA6 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma6,patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-7"></a>**Figure 7.** Simple CMA 6

Figure 7. Simple CMA 6

Sometimes it is useful to also see the daily dose:

# Same plot as above but also showing the daily doses:
plot(cma6, # the object to plot
     patients.to.plot=c("76"), # plot only patient 76 
     print.dose=TRUE, plot.dose=TRUE,
     legend.x=520); # place the legend in a nice way
<a name="Figure-1a"></a>**Figure 7 (a).** Same plot as in **Figure 7**, but also showing the daily doses as numbers and as line thickness

Figure 7 (a). Same plot as in Figure 7, but also showing the daily doses as numbers and as line thickness

CMA7

All CMAs so far have another limitation: they do not consider the interval between the start of the OW and the first event within the OW. For situations in which the OW start coincides with the treatment episode start, this limitation has no consequences. But in scenarios like ours (OW starts during the episode) this has two major drowbacks. First, the time interval for calculating CMA is not the same for all patients; this can result in biases, for example if the intervention group tends to refill sooner after the intervention moment than the control group, the control group might seem more adherent but it is because CMA is calculated on a shorter time interval within the following year. And second, if there is any medication supply left from before the OW start, this is not considered (so CMA may be underestimated).

CMA7 addresses this limitation by extending the nominator to the whole OW interval, and by considering carry over both from before and within the OW. The same paremeters are available to specify whether this depends on the type of medication and considers dosage changes (applied now to both types of carry over). Figure 8 shows how considering the period at the OW start and the prior supply reduces CMA7 to 69%, due to the gap visible in the medication history plot between the event before the OW and the first event within the OW.

cma7 <- CMA7(data=ExamplePats, # we're estimating CMA7 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma7, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-8"></a>**Figure 8.** Simple CMA 7

Figure 8. Simple CMA 7

CMA8

When entering a randomized controlled trial involving a new medication, a patient on ongoing treatment may be more likely to finish the current supply before starting the trial medication. In these situations, it may be more appropriate to consider a lagged start of the OW (even if this results in a different denominator for trial participants). Let’s consider this different scenario for patient 76: at day 374, a new treatment (medB) starts and we need to estimate CMA for the next 294 days (until the next medication change). But there is still some medA left, so it is likely that the patient finished this first. Figure 9 shows how the OW is shortened with the number of days it would have taken to finish the remaining medA (assuming use as prescribed); CMA8 is quite low 36.1%, given the long gaps between medB events. In a future version, it might be interesting to implement the possibility to also move the end of OW so that its length is preserved.

cma8 <- CMA8(data=ExamplePats, # we're estimating CMA8 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=374, 
             observation.window.duration=294,
             date.format="%m/%d/%Y");
plot(cma8, patients.to.plot=c("76"), show.legend=FALSE);
# The value for patient 76, rounded at 2 digits
round(cma8$CMA[cma8$CMA$PATIENT_ID== 76, 2]*100, 2);
## [1] 36.14
<a name="Figure-9"></a>**Figure 9.** Simple CMA 8

Figure 9. Simple CMA 8

CMA9

The previous 8 CMAs were described by Vollmer and colleagues (2012) in relation to randomized controlled trials, and may apply to many observational designs as well. However, they all rely on an assumption that might not hold for longitudinal cohort studies with multiple repeated measures: the medication is used as prescribed until current supply ends. In CMA7, this may introduce additional variation in adherence estimates depending on where the start of the OW is located relative to the last event before the OW and the first event within the OW: an OW start closer to the first event in the OW generates lower estimates for the same number of gap days between the two events. To address this, CMA9 first computes a ratio of days’ supply for each event in the FUW (until the next event or FUW end), then weighs all days in the OW by their corresponding ratio to generate an average CMA value for the OW.

For the same scenario as in CMA1 to CMA7, Figure 10 shows the estimate for CMA9, which is higher than for CMA7 (70.6% versus 69%). This value would be the same no matter if the OW starts slightly earlier or later, because CMA9 considers the same intervals between events (the one starting before and the one ending after the OW). Thus, it depends less on the actual date when the OW starts.

cma9 <- CMA9(data=ExamplePats, # we're estimating CMA9 now!
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             carry.only.for.same.medication=FALSE,
             consider.dosage.change=FALSE,
             followup.window.start=0, observation.window.start=182, 
             observation.window.duration=365,
             date.format="%m/%d/%Y");
plot(cma9, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-10"></a>**Figure 10.** Simple CMA 9

Figure 10. Simple CMA 9

The iterated CMAs

We introduce here two complex (or iterated) CMAs that share the property that they apply a given single CMA iteratively to a set of sub-periods (or windows), defined in various ways.

CMA per episode

When we calculated the persistence and implementation above, we first defined the treatment episodes, and then computed the CMAs within the episode. The CMA_per_episode class allows us to do this in one single step. In our intervention scenario, both example patients had a 2-year treatment episode and we computed the various simple CMAs for a 1-year period within this longer episode. But if we consider that medication change triggers a new treatment episode, patient 76 would have 3 episodes. CMA_per_episode can compute any of the 9 simple CMAs for all treatment episodes for all patients.

As with the simple CMAs, the CMA_per_episode class contains a list that includes all the parameter values, as well as a CMA data.frame (with all columns of the compute.treatment.episodes() output table, plus a new column with the CMA values). The CMA_per_episode values can also be transformed into percentages and rounded, as we did for patient 76 below (rounded to 2 decimals). Plots now include an extra section at the top, where each episode is shown as a horizontal bar of length equal to the episode duration, and the corresponding CMA estimates are given both as percentage (rounded to 1 decimal) and as a grey area. An extra area on the right of the plot displays the distribution of all CMA values for the whole FUW as a histogram or as smoothed kernel density (see Figure 11).

cmaE <- CMA_per_episode(CMA="CMA9", # apply the simple CMA9 to each treatment episode
                        data=ExamplePats,
                        ID.colname="PATIENT_ID",
                        event.date.colname="DATE",
                        event.duration.colname="DURATION",
                        event.daily.dose.colname="PERDAY",
                        medication.class.colname="CATEGORY",
                        carryover.within.obs.window = TRUE,
                        carry.only.for.same.medication = FALSE,
                        consider.dosage.change = FALSE, # conditions on treatment episodes
                        medication.change.means.new.treatment.episode = TRUE,
                        maximum.permissible.gap = 180,
                        maximum.permissible.gap.unit = "days",
                        followup.window.start=0,
                        followup.window.start.unit = "days",
                        followup.window.duration = 365 * 2,
                        followup.window.duration.unit = "days",
                        observation.window.start=0,
                        observation.window.start.unit = "days",
                        observation.window.duration=365*2,
                        observation.window.duration.unit = "days",
                        date.format="%m/%d/%Y",
                        parallel.backend="none",
                        parallel.threads=1);
# Summary:
cmaE;
## CMA_per_episode:
##   "CMA per treatment episode"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     event.daily.dose.colname = PERDAY
##     medication.class.colname = CATEGORY
##     medication.groups = <NONE>
##     flatten.medication.groups = FALSE
##     medication.groups.colname = .MED_GROUP_ID
##     carryover.within.obs.window = TRUE
##     carryover.into.obs.window = TRUE
##     carry.only.for.same.medication = FALSE
##     consider.dosage.change = FALSE
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.start.per.medication.group = FALSE
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 0
##     observation.window.start.unit = days
##     observation.window.duration = 730
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     computed.CMA = CMA9
##     CMA = CMA results for 5 patients
##   ]
##   DATA: 19 (rows) x 5 (columns) [2 patients].
# The CMA estimates table:
cmaE$CMA
##   PATIENT_ID episode.ID episode.start end.episode.gap.days episode.duration
## 1         37          1    2036-04-10                   56              211
## 2         37          2    2037-01-02                  122              463
## 3         76          1    2035-12-13                    0              374
## 4         76          2    2036-12-21                   60              234
## 5         76          3    2037-10-11                   32               62
##   episode.end       CMA
## 1  2036-11-07 0.7109005
## 2  2038-04-10 0.3239741
## 3  2036-12-21 0.8422460
## 4  2037-08-12 0.3846154
## 5  2037-12-12 0.4838710
getCMA(cmaE); # as above but using accessor function
##   PATIENT_ID episode.ID episode.start end.episode.gap.days episode.duration
## 1         37          1    2036-04-10                   56              211
## 2         37          2    2037-01-02                  122              463
## 3         76          1    2035-12-13                    0              374
## 4         76          2    2036-12-21                   60              234
## 5         76          3    2037-10-11                   32               62
##   episode.end       CMA
## 1  2036-11-07 0.7109005
## 2  2038-04-10 0.3239741
## 3  2036-12-21 0.8422460
## 4  2037-08-12 0.3846154
## 5  2037-12-12 0.4838710
# The values for patient 76 only, rounded at 2 digits:
round(cmaE$CMA[cmaE$CMA$PATIENT_ID== 76, 7]*100, 2);
## [1] 84.22 38.46 48.39
# Plot:
plot(cmaE, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-11"></a>**Figure 11.** CMA 9 per episode

Figure 11. CMA 9 per episode

Sliding-window CMA

When discussing the issue of granularity earlier, we mentioned that estimating adherence for a whole 2-year period might be too coarse-grained to be clinically relevant, and that shorter intervals may be more appropriate, for example in studies that aim to investigate how the quality of implementation varies in time during a long-term treatment episode. In such cases, we might want to compare successive intervals, for example 4-month intervals. CMA_sliding_window allows us to compute any of the 9 simple CMAs for repeated time intervals (sliding windows) within an OW. A similar output is produced as for CMA_per_episode, including a CMA table (with patient ID, window ID, window start and end dates, and the CMA estimate). Figure 12 shows the results of CMA9 for patient 76: 6 sliding windows of 4 months, among which 2 have a CMA higher than 80%, two have values around 60% and two around 40%, suggesting a variable quality of implementation.

cmaW <- CMA_sliding_window(CMA.to.apply="CMA9", # apply the simple CMA9 to each sliding window
                           data=ExamplePats,
                           ID.colname="PATIENT_ID",
                           event.date.colname="DATE",
                           event.duration.colname="DURATION",
                           event.daily.dose.colname="PERDAY",
                           medication.class.colname="CATEGORY",
                           carry.only.for.same.medication=FALSE,
                           consider.dosage.change=FALSE,
                           followup.window.start=0,
                           observation.window.start=0,
                           observation.window.duration=365*2,
                           sliding.window.start=0, # sliding windows definition
                           sliding.window.start.unit="days",
                           sliding.window.duration=120,
                           sliding.window.duration.unit="days",
                           sliding.window.step.duration=120,
                           sliding.window.step.unit="days",
                           date.format="%m/%d/%Y",
                           parallel.backend="none",
                           parallel.threads=1);
# Summary:
cmaW;
## CMA_sliding_window:
##   "CMA per sliding window"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     event.daily.dose.colname = PERDAY
##     medication.class.colname = CATEGORY
##     medication.groups = <NONE>
##     flatten.medication.groups = FALSE
##     medication.groups.colname = .MED_GROUP_ID
##     carryover.within.obs.window = TRUE
##     carryover.into.obs.window = TRUE
##     carry.only.for.same.medication = FALSE
##     consider.dosage.change = FALSE
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.start.per.medication.group = FALSE
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 0
##     observation.window.start.unit = days
##     observation.window.duration = 730
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     computed.CMA = CMA9
##     sliding.window.start = 0
##     sliding.window.start.unit = days
##     sliding.window.duration = 120
##     sliding.window.duration.unit = days
##     sliding.window.step.duration = 120
##     sliding.window.step.unit = days
##     CMA = CMA results for 12 patients
##   ]
##   DATA: 19 (rows) x 5 (columns) [2 patients].
# The CMA estimates table:
cmaW$CMA
##    PATIENT_ID window.ID window.start window.end       CMA
## 1          37         1   2036-04-10 2036-08-08 0.4916667
## 2          37         2   2036-08-08 2036-12-06 0.6489297
## 3          37         3   2036-12-06 2037-04-05 0.5197778
## 4          37         4   2037-04-05 2037-08-03 0.3135842
## 5          37         5   2037-08-03 2037-12-01 0.3122259
## 6          37         6   2037-12-01 2038-03-31 0.1973684
## 7          76         1   2035-12-13 2036-04-11 0.8933692
## 8          76         2   2036-04-11 2036-08-09 0.6149642
## 9          76         3   2036-08-09 2036-12-07 1.0000000
## 10         76         4   2036-12-07 2037-04-06 0.6027778
## 11         76         5   2037-04-06 2037-08-04 0.4500000
## 12         76         6   2037-08-04 2037-12-02 0.3985663
getCMA(cmaW); # as above but using accessor function
##    PATIENT_ID window.ID window.start window.end       CMA
## 1          37         1   2036-04-10 2036-08-08 0.4916667
## 2          37         2   2036-08-08 2036-12-06 0.6489297
## 3          37         3   2036-12-06 2037-04-05 0.5197778
## 4          37         4   2037-04-05 2037-08-03 0.3135842
## 5          37         5   2037-08-03 2037-12-01 0.3122259
## 6          37         6   2037-12-01 2038-03-31 0.1973684
## 7          76         1   2035-12-13 2036-04-11 0.8933692
## 8          76         2   2036-04-11 2036-08-09 0.6149642
## 9          76         3   2036-08-09 2036-12-07 1.0000000
## 10         76         4   2036-12-07 2037-04-06 0.6027778
## 11         76         5   2037-04-06 2037-08-04 0.4500000
## 12         76         6   2037-08-04 2037-12-02 0.3985663
# The values for patient 76 only, rounded at 2 digits
round(cmaW$CMA[cmaW$CMA$PATIENT_ID== 76, 5]*100, 2);
## [1]  89.34  61.50 100.00  60.28  45.00  39.86
# Plot:
plot(cmaW, patients.to.plot=c("76"), show.legend=FALSE);
<a name="Figure-12"></a>**Figure 12.** Sliding window CMA 9

Figure 12. Sliding window CMA 9

The sliding windows can also overlap, as illustrated below. This can for example be used to estimate the variation of adherence (implementation) during an episode. Figure 13 shows 21 sliding windows of 4 month for patient 76, in steps of 1 month. The patient’s quality of implementation oscillated between 37% and 100% during the 2 years of follow-up. This output can be further analyzed in relation to patterns of health status if such data are available for the same time period.

<a name="Figure-13"></a>**Figure 13.** Sliding window CMA 9

Figure 13. Sliding window CMA 9

Mapping events to episodes and sliding windows

The functions compute.treatment.episodes, CMA_per_episode and CMA_sliding_window can return the mapping between events and episodes or sliding windows, respectively, in the sense that they can specify, for each episode or sliding which, which events participate in it. For the latter two, this correspondence can also be shown visually in plots (see Figure 14). Please note that, as the details of which events belong to which episode may depend on the particular simple CMA used, it is recommended to take the mapping produced by compute.treatment.episodes as orientative, and use instead the mapping produced by CMA_per_episode for a particular simple CMA.

cmaE <- CMA_per_episode(CMA="CMA1",
                        data=med.events[med.events$PATIENT_ID %in% 1:2,],
                        ID.colname="PATIENT_ID",
                        event.date.colname="DATE",
                        event.duration.colname="DURATION",
                        event.daily.dose.colname="PERDAY",
                        medication.class.colname="CATEGORY",
                        followup.window.start=-90,
                        observation.window.start=0,
                        observation.window.duration=365,
                        maximum.permissible.gap=10,
                        return.mapping.events.episodes=TRUE, # ask for the mapping
                        medication.change.means.new.treatment.episode=TRUE,
                        date.format="%m/%d/%Y");
getEventsToEpisodesMapping(cmaE); # get the mapping (here, print it)
##   PATIENT_ID episode.ID event.index.in.data
## 1          1          1                <NA>
## 2          1          2                   2
## 3          1          2                   3
## 4          1          3                   5
## 5          1          3                   6
## 6          1          4                <NA>
## 7          2          1                  25
## 8          2          3                  28
plot(cmaE, align.all.patients=TRUE, print.dose.centered=TRUE, 
     print.episode.or.sliding.window=TRUE); # show the mapping visually
<a name="Figure-14"></a>**Figure 14.** Per episodes with CMA 1, showing which events belong to which episode: for events, the number(s) between '[]' represent the event they belong to, while for an event, the number between '[]' is what the events use as its identifier. (e.g., the 3rd even from the left for patient 1 has a '[2]', meaning that it belongs to event '2', which is identified as such with a '[2]' immediately after is estimated CMA of '136%'). Please note that the same plot for CMA9 would be quite different, as the rules for which events are considered differ (e.g., the last events of the episodes would be included).

Figure 14. Per episodes with CMA 1, showing which events belong to which episode: for events, the number(s) between ‘[]’ represent the event they belong to, while for an event, the number between ‘[]’ is what the events use as its identifier. (e.g., the 3rd even from the left for patient 1 has a ‘[2]’, meaning that it belongs to event ‘2’, which is identified as such with a ‘[2]’ immediately after is estimated CMA of ‘136%’). Please note that the same plot for CMA9 would be quite different, as the rules for which events are considered differ (e.g., the last events of the episodes would be included).

Interactive plotting

During the exploratory phases of data analysis, it is sometimes extremely useful to be able to plot interactively various views of the data using different parameter settings. We have implemented such interactive plotting of medication histories and (simple and iterative) CMA estimates within RStudio and outside it (using Shiny; this is the default as it very flexible and independent of running AdhereR within RStudio) through the plot_interactive_cma() function. This function is generic and interactive, and can be given a large set of parameters (including the dataset on which to work) or it can interactively acquire these parameters directly from the user.

Please note that this interactive plotting is actually implemented in a different package, AdhereRViz (which extends AdhereR); for more info, please see the vignette AdhereR: Interactive plotting (and more) with Shiny from package AdhereRViz.

Computing event duration from prescription, dispensing, and hospitalization data

Computation of CMAs requires a supply duration for medications dispensed to patients. If medications are not supplied for fixed durations but as a quantity that may last for various durations based on the prescribed dose, the supply duration has to be calculated based on dispensed and prescribed doses. Treatments may be interrupted and resumed at later times, for which existing supplies may or may not be taken into account. Patients may be hospitalized or incarcerated, and may not use their own supplies during these periods. The function compute_event_durations calculates the supply durations, taking into account the aforementioned situations and offering parameters for flexible adjustments.

Computing time to initiation

The period between the first prescription event and the first dose administration may impact health outcomes differently than omitting doses once on treatment or interrupting medication for longer periods of time. Primary non-adherence (not acquiring the first prescription) or delayed initiation may have a negative impact on health outcomes. The function time_to_initiation calculates the time between the first prescription and the first dispensing event, taking into account multiple variables to differentiate between treatments.

Defining medication groups

The main idea behind medication groups is that there might be meaningful ways of grouping medications both for the computation of CMAs and their plotting. For example, a patient might receive medication for treating a hart conditions and medication for a dermatological condition, and we might want to investigate the patient’s patterns of adherence to each of these two broad types (or groups) of medication separately. However, the mechanism for defining such groups much be flexible enough to allow, for example, the same medication to belong to two groups based on dose, duration or other characteristics. Therefore, we implemented this using a very flexible yet powerful and intuitive mechanism that can use any type of information associated to the actual events.

To illustrate, we will use the data that comes with the package: med.events.ATC and med.groups. med.events.ATC is a data.frame with 1564 events (one per row) for 16 patients, containing the patient’s unique identifier (column PATIENT_ID), the event date (DATE) and duration (DURATION), and the medication’s ATC code (column CATEGORY), further detailing its first two levels: level 1 (one letter giving the anatomical main group, e.g., “A” for “Alimentary tract and metabolism”; column CATEGORY_L1) and level 2 (the therapeutic subgroup, e.g., “A02” for “Drugs for acid related disorders”):

First 5 lines of the med.events.ATC dataset.
PATIENT_ID DATE DURATION PERDAY CATEGORY CATEGORY_L1 CATEGORY_L2
1 2057-09-04 28.00000 20 A02BC02 ALIMENTARY TRACT AND METABOLISM DRUGS FOR ACID RELATED DISORDERS
1 2058-06-03 28.00000 20 A02BC02 ALIMENTARY TRACT AND METABOLISM DRUGS FOR ACID RELATED DISORDERS
1 2058-07-09 28.00000 20 A02BC02 ALIMENTARY TRACT AND METABOLISM DRUGS FOR ACID RELATED DISORDERS
1 2056-10-09 41.66667 36000 A09AA02 ALIMENTARY TRACT AND METABOLISM DIGESTIVES, INCL. ENZYMES
1 2056-12-10 40.00000 36000 A09AA02 ALIMENTARY TRACT AND METABOLISM DIGESTIVES, INCL. ENZYMES

In fact, med.events.ATC is derived from the durcomp data as follows:

event_durations <- compute_event_durations(disp.data = durcomp.dispensing,
                                           presc.data = durcomp.prescribing,
                                           special.periods.data = durcomp.hospitalisation,
                                           ID.colname = "ID",
                                           presc.date.colname = "DATE.PRESC",
                                           disp.date.colname = "DATE.DISP",
                                           medication.class.colnames = c("ATC.CODE", "UNIT", "FORM"),
                                           total.dose.colname = "TOTAL.DOSE",
                                           presc.daily.dose.colname = "DAILY.DOSE",
                                           presc.duration.colname = "PRESC.DURATION",
                                           visit.colname = "VISIT",
                                           split.on.dosage.change = TRUE,
                                           force.init.presc = TRUE,
                                           force.presc.renew = TRUE,
                                           trt.interruption = "continue",
                                           special.periods.method = "continue",
                                           date.format = "%Y-%m-%d",
                                           suppress.warnings = FALSE,
                                           return.data.table = FALSE);
med.events.ATC <- event_durations$event_durations[ !is.na(event_durations$event_durations$DURATION) & 
                                                     event_durations$event_durations$DURATION > 0,
                                         c("ID", "DISP.START", "DURATION", "DAILY.DOSE", "ATC.CODE")];
names(med.events.ATC) <- c("PATIENT_ID", "DATE", "DURATION", "PERDAY", "CATEGORY");
# Groups from the ATC codes:
sort(unique(med.events.ATC$CATEGORY)); # all the ATC codes in the data
# Level 1:
med.events.ATC$CATEGORY_L1 <- vapply(substr(med.events.ATC$CATEGORY,1,1), switch, character(1),
                                     "A"="ALIMENTARY TRACT AND METABOLISM",
                                     "B"="BLOOD AND BLOOD FORMING ORGANS",
                                     "J"="ANTIINFECTIVES FOR SYSTEMIC USE",
                                     "R"="RESPIRATORY SYSTEM",
                                     "OTHER");
# Level 2:
med.events.ATC$CATEGORY_L2 <- vapply(substr(med.events.ATC$CATEGORY,1,3), switch, character(1),
                                     "A02"="DRUGS FOR ACID RELATED DISORDERS",
                                     "A05"="BILE AND LIVER THERAPY",
                                     "A09"="DIGESTIVES, INCL. ENZYMES",
                                     "A10"="DRUGS USED IN DIABETES",
                                     "A11"="VITAMINS",
                                     "A12"="MINERAL SUPPLEMENTS",
                                     "B02"="ANTIHEMORRHAGICS",
                                     "J01"="ANTIBACTERIALS FOR SYSTEMIC USE",
                                     "J02"="ANTIMYCOTICS FOR SYSTEMIC USE",
                                     "R03"="DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES",
                                     "R05"="COUGH AND COLD PREPARATIONS",
                                     "OTHER");

For this dataset, we could define the following medication groups:

These are defined in med.groups, which is a named vector of characters:

med.groups <- c("Vitamins"  = "(CATEGORY_L2 == 'VITAMINS')",
                "VitaResp"  = "({Vitamins} | CATEGORY_L1 == 'RESPIRATORY SYSTEM')",
                "VitaShort" = "({Vitamins} & DURATION <= 30)",
                "VitELow"   = "(CATEGORY == 'A11HA03' & PERDAY <= 500)",
                "VitaComb"  = "({VitaShort} | {VitELow})",
                "NotVita"   = "(!{Vitamins})");

The names in the vector are the names of the various medication groups (e.g., “Vitamins” is the name of the first group, corresponding to all the vitamins), while the values of the vector are the actual definitions and use the same syntax, operators and conventions as any R expression. The name of a medication group can be pretty much any string of characters not containing “}”, but it is recommended to keep them short, expressive and (as much as possible) free of non-alphanumeric symbols (in fact, the best recommendation is to stick to the rules for defining legal R variable and function names). Focussing on the first one, (CATEGORY_L2 == 'VITAMINS'), CATEGORY_L2 refers to the CATEGORY_L2 column in the med.events.ATC dataset, 'VITAMINS' is a possible value that may appear in that column, and (, ) and == have the usual interpretation in R (grouping and test of equality, respectively). In fact, this is translated internally into an expression that is evaluated on the med.events.ATC dataset, resulting in a vector of logical values, one per row, saying which events in med.events.ATC correspond to this medication group (TRUE, there are 294 such events) and which not (FALSE, the remaining 1270 events):

First 5 events in med.events.ATC corresponding to the Vitamins medication group.
PATIENT_ID DATE DURATION PERDAY CATEGORY CATEGORY_L1 CATEGORY_L2
2 2056-10-21 90.00000 1111.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS
2 2057-04-14 90.00000 1111.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS
2 2058-04-07 90.00000 1111.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS
3 2057-09-02 105.00000 14285.714 A11CA01 ALIMENTARY TRACT AND METABOLISM VITAMINS
3 2056-07-01 41.14286 4861.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS

A special notation is used to “make reference” to another named medication group, by surrounding the medication group’s name by { and }: the second medication group “VitaResp” si defined as ({Vitamins} | CATEGORY_L1 == 'RESPIRATORY SYSTEM'), where {Vitamins} simply refers to the events corresponding to the “Vitamins” group discussed above; internally, the “Vitamins” group is evaluated and its vector of logicals is combined using | (OR) with the vector of logicals corresponding to CATEGORY_L1 == 'RESPIRATORY SYSTEM':

First 5 events in med.events.ATC corresponding to the VitaResp medication group.
PATIENT_ID DATE DURATION PERDAY CATEGORY CATEGORY_L1 CATEGORY_L2
1 2056-10-09 30 100.000 R03AC12 RESPIRATORY SYSTEM DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES
2 2056-10-21 90 1111.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS
2 2057-04-14 90 1111.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS
2 2058-04-07 90 1111.111 A11CC05 ALIMENTARY TRACT AND METABOLISM VITAMINS
2 2056-10-21 60 400.000 R03AK07 RESPIRATORY SYSTEM DRUGS FOR OBSTRUCTIVE AIRWAY DISEASES

This mechanism is very flexible and extensible, allowing, for example, tests involving the duration (“VitaShort”) and the dosage (“VitELow”), and the combination of several defined groups (“VitaComb”). However, it is important to remember that:

  1. these must be well-formed and meaningful expressions,
  2. using column names and values in the dataset containing the events,
  3. and/or other groups between { and },
  4. and must evaluate to valid logical vectors of the same length as the number of events (rows) in the dataset.

By default, an extra special medication group __ALL_OTHERS__ is automatically computed, including all events that are not covered by any of the explicitly defined medication groups (if any).

Please note that, for security reasons, this evaluation is performed in a separate environment where only certain functions and operators are allowed (the functions in the Math, Arith, Logic and Compare groups, and the operators (, [ and !).

With these, definitions of medication groups can be passed to all CMA functions using the medication.groups parameter:

cma1_mg <- CMA1(data=med.events.ATC,
                ID.colname="PATIENT_ID",
                event.date.colname="DATE",
                event.duration.colname="DURATION",
                event.daily.dose.colname="PERDAY",
                medication.class.colname="CATEGORY",
                medication.groups=med.groups,
                followup.window.start=0,
                observation.window.start=0,
                observation.window.duration=365,
                date.format="%m/%d/%Y");
cma1_mg; # print it
## CMA1:
##   "The ratio of days with medication available in the observation window excluding the last event; durations of all events added up and divided by number of days from first to last event, possibly resulting in a value >1.0"
##   [
##     ID.colname = PATIENT_ID
##     event.date.colname = DATE
##     event.duration.colname = DURATION
##     medication.groups = 7 ['Vitamins', 'VitaResp', 'VitaShort', 'VitELow' ...]
##     flatten.medication.groups = FALSE
##     medication.groups.colname = .MED_GROUP_ID
##     followup.window.start = 0
##     followup.window.start.unit = days
##     followup.window.start.per.medication.group = FALSE
##     followup.window.duration = 730
##     followup.window.duration.unit = days
##     observation.window.start = 0
##     observation.window.start.unit = days
##     observation.window.duration = 365
##     observation.window.duration.unit = days
##     date.format = %m/%d/%Y
##     CMA = CMA results for  patients
##   ]
##   DATA: 1564 (rows) x 7 (columns) [16 patients].

There is a new function, getMGs(), which returns the medication groups for a CMA object (if none, NULL); this is a list with two members (here, for getMGs(cma1_mg)):

##             name                                                   def
## 1       Vitamins                           (CATEGORY_L2 == 'VITAMINS')
## 2       VitaResp    ({Vitamins} | CATEGORY_L1 == 'RESPIRATORY SYSTEM')
## 3      VitaShort                         ({Vitamins} & DURATION <= 30)
## 4        VitELow               (CATEGORY == 'A11HA03' & PERDAY <= 500)
## 5       VitaComb                             ({VitaShort} | {VitELow})
## 6        NotVita                                         (!{Vitamins})
## 7 __ALL_OTHERS__ *all observations not included in the defined groups*
##       Vitamins VitaResp VitaShort VitELow VitaComb NotVita __ALL_OTHERS__
##  [1,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [2,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [3,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [4,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [5,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [6,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [7,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [8,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
##  [9,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE
## [10,]    FALSE    FALSE     FALSE   FALSE    FALSE    TRUE          FALSE

The getCMA() function works as before, except that now, by default, it returns a list with as many entries as there are medication groups (+ __ALL_OTHERS__), each containing its own CMA estimates:

getCMA(cma1_mg);
## $Vitamins
##    PATIENT_ID       CMA
## 1           1        NA
## 2           2 0.5142857
## 3           3 0.7568643
## 4           4 2.1052632
## 5           5        NA
## 6           6 0.5471125
## 7           7 3.5398230
## 8           8 0.9401709
## 9           9 0.7993730
## 10         10 1.4860681
## 11         11 2.9640719
## 12         12        NA
## 13         13 0.2419355
## 14         14 1.2053571
## 15         15 2.3823529
## 16         16 4.6537396
## 
## $VitaResp
##    PATIENT_ID       CMA
## 1           1        NA
## 2           2 1.7582418
## 3           3 1.4671707
## 4           4 3.1286550
## 5           5 0.4731861
## 6           6 1.4589666
## 7           7 5.0796460
## 8           8 2.5631868
## 9           9 1.0689655
## 10         10 2.2123894
## 11         11 3.9520958
## 12         12        NA
## 13         13 5.4243574
## 14         14 2.8161290
## 15         15 2.8235294
## 16         16 5.9002770
## 
## $VitaShort
##    PATIENT_ID       CMA
## 1           1        NA
## 2           2        NA
## 3           3 0.6422602
## 4           4 0.8771930
## 5           5        NA
## 6           6 0.5471125
## 7           7 1.6814159
## 8           8 0.9401709
## 9           9 0.5172414
## 10         10 0.5020921
## 11         11 0.9880240
## 12         12        NA
## 13         13 0.2521008
## 14         14        NA
## 15         15 0.5294118
## 16         16 0.9972299
## 
## $VitELow
##    PATIENT_ID       CMA
## 1           1        NA
## 2           2        NA
## 3           3        NA
## 4           4 0.6000000
## 5           5        NA
## 6           6        NA
## 7           7 0.7920792
## 8           8        NA
## 9           9        NA
## 10         10        NA
## 11         11        NA
## 12         12        NA
## 13         13        NA
## 14         14        NA
## 15         15 0.7058824
## 16         16 1.2465374
## 
## $VitaComb
##    PATIENT_ID       CMA
## 1           1        NA
## 2           2        NA
## 3           3 0.6422602
## 4           4 0.8771930
## 5           5        NA
## 6           6 0.5471125
## 7           7 1.6814159
## 8           8 0.9401709
## 9           9 0.5172414
## 10         10 0.5020921
## 11         11 0.9880240
## 12         12        NA
## 13         13 0.2521008
## 14         14        NA
## 15         15 0.7058824
## 16         16 2.3268698
## 
## $NotVita
##    PATIENT_ID       CMA
## 1           1 0.6751703
## 2           2 1.0989011
## 3           3 5.3997214
## 4           4 5.2923977
## 5           5 1.2978525
## 6           6 3.8254804
## 7           7 6.8138592
## 8           8 5.9799451
## 9           9 3.0228392
## 10         10 3.4283579
## 11         11 6.0445147
## 12         12        NA
## 13         13 7.5670229
## 14         14 6.5303201
## 15         15 3.8470588
## 16         16 5.8915710
## 
## $`__ALL_OTHERS__`
## NULL

Thus, patient 2 has a CMA1 of ~1.77 for “Vitamins”, of ~1.76 for “VitaResp”, none (NA) for “VitaShort”, “VitELow” and “VitaComb”, of ~2.76 for “NotVita”. Here, as can also be seen in the obs component of getMGs(cma1_mg), the default __ALL_OTHERS__ did not select any events, results in an empty CMA estimate for it.

Sometimes, however, this structuring of the CMAs as a list may not be desirable, in which case the parameter flatten.medication.groups = TRUE comes in handy:

getCMA(cma1_mg, flatten.medication.groups=TRUE);
##    PATIENT_ID       CMA .MED_GROUP_ID
## 1           1        NA      Vitamins
## 2           2 0.5142857      Vitamins
## 3           3 0.7568643      Vitamins
## 4           4 2.1052632      Vitamins
## 5           5        NA      Vitamins
## 6           6 0.5471125      Vitamins
## 7           7 3.5398230      Vitamins
## 8           8 0.9401709      Vitamins
## 9           9 0.7993730      Vitamins
## 10         10 1.4860681      Vitamins
## 11         11 2.9640719      Vitamins
## 12         12        NA      Vitamins
## 13         13 0.2419355      Vitamins
## 14         14 1.2053571      Vitamins
## 15         15 2.3823529      Vitamins
## 16         16 4.6537396      Vitamins
## 17          1        NA      VitaResp
## 18          2 1.7582418      VitaResp
## 19          3 1.4671707      VitaResp
## 20          4 3.1286550      VitaResp
## 21          5 0.4731861      VitaResp
## 22          6 1.4589666      VitaResp
## 23          7 5.0796460      VitaResp
## 24          8 2.5631868      VitaResp
## 25          9 1.0689655      VitaResp
## 26         10 2.2123894      VitaResp
## 27         11 3.9520958      VitaResp
## 28         12        NA      VitaResp
## 29         13 5.4243574      VitaResp
## 30         14 2.8161290      VitaResp
## 31         15 2.8235294      VitaResp
## 32         16 5.9002770      VitaResp
## 33          1        NA     VitaShort
## 34          2        NA     VitaShort
## 35          3 0.6422602     VitaShort
## 36          4 0.8771930     VitaShort
## 37          5        NA     VitaShort
## 38          6 0.5471125     VitaShort
## 39          7 1.6814159     VitaShort
## 40          8 0.9401709     VitaShort
## 41          9 0.5172414     VitaShort
## 42         10 0.5020921     VitaShort
## 43         11 0.9880240     VitaShort
## 44         12        NA     VitaShort
## 45         13 0.2521008     VitaShort
## 46         14        NA     VitaShort
## 47         15 0.5294118     VitaShort
## 48         16 0.9972299     VitaShort
## 49          1        NA       VitELow
## 50          2        NA       VitELow
## 51          3        NA       VitELow
## 52          4 0.6000000       VitELow
## 53          5        NA       VitELow
## 54          6        NA       VitELow
## 55          7 0.7920792       VitELow
## 56          8        NA       VitELow
## 57          9        NA       VitELow
## 58         10        NA       VitELow
## 59         11        NA       VitELow
## 60         12        NA       VitELow
## 61         13        NA       VitELow
## 62         14        NA       VitELow
## 63         15 0.7058824       VitELow
## 64         16 1.2465374       VitELow
## 65          1        NA      VitaComb
## 66          2        NA      VitaComb
## 67          3 0.6422602      VitaComb
## 68          4 0.8771930      VitaComb
## 69          5        NA      VitaComb
## 70          6 0.5471125      VitaComb
## 71          7 1.6814159      VitaComb
## 72          8 0.9401709      VitaComb
## 73          9 0.5172414      VitaComb
## 74         10 0.5020921      VitaComb
## 75         11 0.9880240      VitaComb
## 76         12        NA      VitaComb
## 77         13 0.2521008      VitaComb
## 78         14        NA      VitaComb
## 79         15 0.7058824      VitaComb
## 80         16 2.3268698      VitaComb
## 81          1 0.6751703       NotVita
## 82          2 1.0989011       NotVita
## 83          3 5.3997214       NotVita
## 84          4 5.2923977       NotVita
## 85          5 1.2978525       NotVita
## 86          6 3.8254804       NotVita
## 87          7 6.8138592       NotVita
## 88          8 5.9799451       NotVita
## 89          9 3.0228392       NotVita
## 90         10 3.4283579       NotVita
## 91         11 6.0445147       NotVita
## 92         12        NA       NotVita
## 93         13 7.5670229       NotVita
## 94         14 6.5303201       NotVita
## 95         15 3.8470588       NotVita
## 96         16 5.8915710       NotVita

(It can also be directly passed to the CMA function.)

Please note that, by default, all medication groups belonging to one patient share the same follow-up and observation windows, but this can be changed by setting the parameter followup.window.start.per.medication.group = TRUE so that each medication group has its own follow-up and observation windows.

The medication groups also affect the plotting:

plot(cma1_mg,
     #medication.groups.to.plot=c("VitaResp", "VitaShort", "VitaComb"),
     patients.to.plot=1:3,
     show.legend=FALSE);
<a name="Figure-15"></a>**Figure 15.** CMA 1 with medication groups.

Figure 15. CMA 1 with medication groups.

The medication groups are ordered within patients, and visually grouped using alternating thick blue lines (which can be controlled through the parameters medication.groups.separator.show, medication.groups.separator.lty, medication.groups.separator.lwd and medication.groups.separator.color); likewise, the name of the special __ALL_OTHERS__ groups can be controlled with the medication.groups.allother.label parameter (defaults to “*“). Only the medication groups that contain at least one event are shown for each patient. If desired, the parameter medication.groups.to.plot can be used to explicitly list the names of the medication groups to include in the plot (by default, all).

Medication groups work in the same way across all simple and complex CMAs, an are included in the interactive Shiny interface as well.

As a special case, the medication.groups parameter can simply be the name of a column in the data that define the medication groups through its values. For example,

cma0_types <- CMA0(data=med.events.ATC,
                   ID.colname="PATIENT_ID",
                   event.date.colname="DATE",
                   event.duration.colname="DURATION",
                   event.daily.dose.colname="PERDAY",
                   medication.class.colname="CATEGORY",
                   medication.groups="CATEGORY_L1",
                   #flatten.medication.groups=TRUE,
                   #followup.window.start.per.medication.group=TRUE,
                   followup.window.start=0,
                   observation.window.start=0,
                   observation.window.duration=365,
                   date.format="%m/%d/%Y");

defines 4 medication groups “ALIMENTARY TRACT AND METABOLISM”, “ANTIINFECTIVES FOR SYSTEMIC USE”, “BLOOD AND BLOOD FORMING ORGANS”, “RESPIRATORY SYSTEM” using the CATEGORY_L1 column in the example med.events.ATC dataset. Behind the scenes, this is simply expanded into the medication group definitions:

## c("ALIMENTARY TRACT AND METABOLISM" = "(CATEGORY_L1 == 'ALIMENTARY TRACT AND METABOLISM')",
##   "ANTIINFECTIVES FOR SYSTEMIC USE" = "(CATEGORY_L1 == 'ANTIINFECTIVES FOR SYSTEMIC USE')",
##   "BLOOD AND BLOOD FORMING ORGANS" = "(CATEGORY_L1 == 'BLOOD AND BLOOD FORMING ORGANS')",
##   "RESPIRATORY SYSTEM" = "(CATEGORY_L1 == 'RESPIRATORY SYSTEM')");

Please note that if no medication groups are defined (the default), then nothing changes in the behaviour of the package (so it is fully backwards compatible).

Working with very large databases

AdhereR can use data stored in a variety of “classical” RDBMS’s (Relational Database Management Systems), such as MySQL or SQLite, either through explicit SQL queries or transparently through dbplyr, or in other systems, such as the Apache Hadoop. Thus, AdhereR can access (very) large quantities of data stored in various formats and using different backends, and can process it ranging from single-threaded set-ups on a client machine to large heterogeneous distributed clusters (using, for example, various explicit parallel processing frameworks or Hadoop’s MapReduce framework). All these are detailed in a dedicated vignette “Using AdhereR with various database technologies for processing very large datasets”.

Using AdhereR from other programming languages and platforms

AdhereR can be transparently used from other programming languages than R (such as Python) or platforms (such as Stata) by implementing a startdized interface defined for these purposes. A working implementation for Python 3 is included in the package (and inteded also as a hands-on guide to further implementations) and is described in detailed in a dedicated vignette “Calling AdhereR from Python3”.

AdhereR is pipe (%>% or |>)-friendly

The pipe operator (%>%) has been originally introduced in R by the magrittr package and is a central feature of the Tidyverse, and is since R 4.1 also available natively as |> (but there are some subtle differences from %>%). AdhereR is compatible with pipe (because for all functions the first parameter encapsulates the “important” data, with the others providing various “tweaks”), so, code such as the following is fully functional:

library(dplyr);

# Compute, then get the CMA, change it and print it:
x <- med.events %>%                      # use med.events
  filter(PATIENT_ID %in% c(1,2,3)) %>%   # first 3 patients
  CMA9(ID.colname="PATIENT_ID",          # compute CMA9
       event.date.colname="DATE",
       event.duration.colname="DURATION",
       event.daily.dose.colname="PERDAY",
       medication.class.colname="CATEGORY",
       followup.window.start=230,
       followup.window.duration=705,
       observation.window.start=41,
       observation.window.duration=100,
       date.format="%m/%d/%Y") %>%
  getCMA() %>% # get the CMA estimates
  mutate(CMA=sprintf("%.1f%%",100*CMA)); # make them percents
print(x); # print it

# Plot some CMAs:
med.events %>% # use med.events
  filter(PATIENT_ID %in% c(1,2,3)) %>%             # first 3 patients
  CMA_sliding_window(CMA.to.apply="CMA7",          # sliding windows CMA7
                     ID.colname="PATIENT_ID",
                     event.date.colname="DATE",
                     event.duration.colname="DURATION",
                     event.daily.dose.colname="PERDAY",
                     medication.class.colname="CATEGORY",
                     followup.window.start=230,
                     followup.window.duration=705,
                     observation.window.start=41,
                     observation.window.duration=100,
                     date.format="%m/%d/%Y") %>%
  plot(align.all.patients=TRUE, show.legend=TRUE); # plot it

Modifying AdhereR plots a posteriori

# Plot CMA7 for patients 5 and 8:
cma7 <- CMA7(data=med.events,
             ID.colname="PATIENT_ID",
             event.date.colname="DATE",
             event.duration.colname="DURATION",
             event.daily.dose.colname="PERDAY",
             medication.class.colname="CATEGORY",
             followup.window.start=30,
             observation.window.start=30,
             observation.window.duration=365,
             date.format="%m/%d/%Y"
);
plot(cma7, patients.to.plot=c(5,8), show.legend=TRUE); # good plot after an error plot

# Access the plotting info:
pevs <- get.plotted.events(); # get the plotted events with their plotting info

# Let's add a vertical line for patient 8 between the medication change:
# Find the event where the medication changes:
i <- which(pevs$PATIENT_ID == "8" & 
             pevs$CATEGORY != c(pevs$CATEGORY[-1], pevs$CATEGORY[nrow(pevs)]));
# Half-way between the events where medication changes:
x <- (pevs$.X.END[i] + pevs$.X.START[i+1])/2;
# Draw the line:
segments(x, pevs$.Y.OW.START[i], x, pevs$.Y.OW.END[i],
         col="blue", lty="solid", lwd=3);

# Put a star * next to the 4th event of patient 5:
# Find the event:
i <- which(pevs$PATIENT_ID == "5")[4];
# Plot the star:
text(pevs$.X.START[i]-strwidth("*   "), pevs$.Y.START[i],
     "*", cex=3.0, col="darkgreen");

# Add some random text over the figure:
text((pevs$.X.FUW.START[1] + pevs$.X.FUW.END[1])/2, # X center of patient 5's FUW
     (pevs$.Y.FUW.START[nrow(pevs)] + pevs$.Y.FUW.END[nrow(pevs)])/2, # Y center of 8's FUW
     "Change with care!!!", srt=45, cex=1.5, col="darkred")
<a name="Figure-16"></a>**Figure 16.** Modifying an `AdhereR` plot is easy using the `get.plotted.events()` function.

Figure 16. Modifying an AdhereR plot is easy using the get.plotted.events() function.

Technical details

Here we overview some technical details, including the main S3 classes and functions (probably useful for scripting and extension), our treatment of dates and durations, and the issue of performance and parallelism (useful for large datasets).

Main S3 classes and functions

The S3 class CMA0 is the most basic object, basically encapsulating the dataset and desired parameter values; it should not be normally used directly (except for plotting the event data as such), but it is the foundation for the other classes. A CMA0 (and derived) object can print itself (the output is optimized either for text, LaTeX or Markdown), can plot itself (with various parameters controlling exactly how), and offers the accessor function getCMA() for easy access to the CMA estimate. Please note that these CMAs all work for datasets that contain more than one patient, and the estimates are computed for each patient independently, and the plotting can display more than one patient (in this case the patients are plotted on top of each other vertically), as shown in Figure 1.

The simple CMAs are implemented by the S3 classes CMA1CMA9, that are derived from CMA0 and reload its methods. Thus, one can easily implement a new simple CMA by extending the base CMA0 class.

The iterative CMAs, in contrast, are not derived from CMA0 but use internally such a simple CMA to perform their computations. For the moment, they can not be extended to new simple CMAs derived from CMA0, but, if needed, such a mechanism could be implemented.

The most important functions are:

Calendar dates and durations

A potentially confusing (but very powerful and flexible) aspect of our implementation concerns our treatment of dates and durations.

First, the duration of an event is given in a column in the dataset containing, for each event, its duration (as a positive integer) in days. However, other durations (such as for FUW or the sliding windows) are given as positive integers representing the number of units; these units can be “days” (the default), “weeks”, “months”, or “years”.

The date of an event is given in a column in the dataset containing, for each event, its start date as a string (character) in the format given by the date.format parameter (by default, mm/dd/yyyy). The start of the FUW, OW and sliding windows can be given either as the number (integer) of units (“days”, “weeks”, “months”, or “years”) since the first recorded event for the patient, or as an object of class Date representing the actual calendar start date, or a string (character) giving a column name in the dataset containing, per patient, either the calendar start date as Date object (i.e., this column must be of type Date) or as the number of units if the column has type numeric. While this might be confusing, it allows greater flexibility in specifying the start dates; the most important pitfall is in passing a date as a string (type character) which will result in an error as there is no such column in the dataset – make sure it is converted to a Date object by using, for example, as.Date()! However, for most scenarios, the default of giving the number of units since the earliest event is more than enough and is the recommended (and most carefully tested) way.

Performance, parallelism and implementation

While currently implemented in pure R, we have extensively profiled and optimized our code to allow the processing of large databases even on consumer-grade hardware. For example, Table 4 below gives the running times (single-threaded and two parallel multicore threads – see below for details) for a database of 13,922 unique patients and 112,984 prescriptions of all CMAs described here, on an Apple MacBook Air 11” (7,1; early 2015) with 8Go RAM (DDR3 @ 1600MHz) and a Core i7-5650U CPU (2 cores, 4 threads with hyperthreading @ 2.20GHz, Turbo Boost to 3.10GHz), using MacOS X “El Capitan” (10.11.6), R 3.3.1 (64 bits) and RStudio 1.0.44. Table 5 below shows the running times (single-threaded and four parallel multicore threads) for a very large database of 500,000 unique patients and 4,058,110 prescriptions (generated by repeatedly concatenating the database described above and uniquely renaming the participants) of all CMAs described here, on a desktop computer with 16Go RAM and a Core i7-3770 CPU (4 cores, 8 threads with hyperthreading @ 3.40GHz, Turbo Boost to 3.90GHz), using OpenSuse 13.2 (Linux kernel 3.16.7) and R 3.3.2 (64 bits). Table 6 shows the same information as Table 5, but on a high-end desktop computer with 32Go RAM and a Core i7-4790K CPU (4 cores, 8 threads with hyperthreading @ 4.00GHz, Turbo Boost to 4.40GHz), running Windows 10 Professional 64 bits (version 1607) and R 3.2.4 (64 bits); as dicusssed below, the “multicore” backend is currently not available on Windows.

As these benchmarking results show, a database close to the median sample sizes in the literature (median 10,265 patients versus our 13,922 patients; Sattler et al., 2011) can be processed almost in real-time on a consumer laptop, while very large databases (half a million patients) require tens of minutes to a few hours on a mid-to-high end desktop computers, especially when making use of parallel processing. Interestingly, Linux seems to have a small but measurable performance advantage over Windows (despite the slightly lower-end hardware) and the “multicore” backend becomes preferable to the “snow” backend for very large databases (probably due to the data transmission and collection overheads), but not by a very large margin. Therefore, for very large databases, we recommend Linux on a multi-core/multi-CPU mechine with enough RAM and the “multicore” backend.

Table 4. Performance as running times (single- and two-threaded, multicore and snow respectively) when computing CMAs for a large database with 13,922 patients with 112,983 events on a consumer-grade MacBook Air laptop running MacOSX El Capitan. The times shown are “real” (i.e., clock) running times in seconds (as reported by R’s system.time() function) and minutes. In all cases, the FUW and OW are identical at 2 years long. CMAs per episode (with gap=180 days) and sliding window (length=180 days, step=90 days) used CMA1 for each episode/window. Please note that the multicore and snow times are slightly longer than half the single-core times due to various data transmission and collection overheads.
CMA Single-threaded Two threads (multicore) Two threads (snow)
CMA 1 40.8 (0.7) 20.8 (0.4) 22.0 (0.4)
CMA 2 41.2 (0.7) 21.7 (0.4) 24.4 (0.4)
CMA 3 39.3 (0.7) 20.4 (0.3) 22.9 (0.4)
CMA 4 40.2 (0.7) 21.3 (0.4) 23.0 (0.4)
CMA 5 56.6 (0.9) 29.7 (0.5) 31.5 (0.5)
CMA 6 58.0 (1.0) 30.9 (0.5) 32.5 (0.5)
CMA 7 55.5 (0.9) 28.9 (0.5) 30.6 (0.5)
CMA 8 131.8 (2.2) 72.5 (1.2) 71.6 (1.2)
CMA 9 159.4 (2.7) 85.2 (1.4) 86.5 (1.4)
per episode 263.9 (4.4) 139.0 (2.3) 139.7 (2.3)
sliding window 643.6 (10.7) 347.9 (5.8) 339.5 (5.7)
Table 5. Performance as running times (single- and two-threaded, multicore and snow respectively) when computing CMAs for a very large large database with 500,000 patients with 4,058,110 events on a mid/high-range consumer desktop running OpenSuse 13.2 Linux. The times shown are “real” (i.e., clock) running times in seconds (as reported by R’s system.time() function), minutes and, if large enough, hours. In all cases, the FUW and OW are identical at 2 years long. CMAs per episode (with gap=180 days) and sliding window (length=180 days, step=90 days) used CMA1 for each episode/window. Please note that the multicore and especially the snow times are slightly longer than a quarter the single-core times due to various data transmission and collection overheads.
CMA Single-threaded Four threads (multicore) Four threads (snow)
CMA 1 1839.7 (30.6) 577.0 (9.6) 755.5 (12.6)
CMA 2 1779.0 (29.7) 490.1 (8.2) 915.7 (15.3)
CMA 3 1680.6 (28.0) 458.5 (7.6) 608.3 (10.1)
CMA 4 1778.9 (30.0) 489.0 (8.2) 644.5 (10.7)
CMA 5 2500.7 (41.7) 683.3 (11.4) 866.2 (14.4)
CMA 6 2599.8 (43.3) 714.5 (11.9) 1123.8 (18.7)
CMA 7 2481.2 (41.4) 679.4 (11.3) 988.1 (16.5)
CMA 8 5998.0 (100.0 = 1.7 hours) 1558.1 (26.0) 2019.6 (33.7)
CMA 9 7039.7 (117.3 = 1.9 hours) 1894.7 (31.6) 3002.7 (50.0)
per episode 11548.5 (192.5 = 3.2 hours) 3030.5 (50.5) 3994.2 (66.6)
sliding window 27651.3 (460.8 = 7.7 hours) 7198.3 (120.0 = 2.0 hours) 12288.8 (204.8 = 3.4 hours)
Table 6. Performance as running times (single- and two-threaded, multicore and snow respectively) when computing CMAs for a very large large database with 500,000 patients with 4,058,110 events on a high-end desktop computer running Windows 10. The times shown are “real” (i.e., clock) running times in seconds (as reported by R’s system.time() function), minutes and, if large enough, hours. In all cases, the FUW and OW are identical at 2 years long. CMAs per episode (with gap=180 days) and sliding window (length=180 days, step=90 days) used CMA1 for each episode/window. Please note that the snow times are longer than a quarter the single-core times due to various data transmission and collection overheads.
CMA Single-threaded Four threads (snow)
CMA 1 2070.9 (34.5) 653.1 (10.9)
CMA 2 2098.9 (35.0) 667.5 (13.4)
CMA 3 2013.8 (33.6) 661.5 (22.0)
CMA 4 2094.4 (34.9) 685.2 (11.4)
CMA 5 2823.4 (47.1) 881.0 (14.7)
CMA 6 2909.0 (48.5) 910.3 (15.2)
CMA 7 2489.1 (41.5) 772.6 (12.9)
CMA 8 5982.5 (99.7 = 1.7 hours) 1810.1 (30.2)
CMA 9 6030.2 (100.5 = 1.7 hours) 2142.1 (35.7)
per episode 10717.1 (178.6 = 3.0 hours) 3877.2 (64.6)
sliding window 25769.5 (429.5 = 7.2 hours) 9353.6 (155.9 = 2.6 hours)

Concerning parallelism, if run on a multi-core/multi-processor machine or cluster, AdhereR gives the user the possibility to use (completely transparently) two parallel backends: multicore (available on Linux, *BSD and MacOS, but currently not on Microsoft Windows) and snow (Simple Network of Workstations, available on all platforms; in fact, this can use various types of backends, see the documentation in package snow for details). Parallelism is available through the parallel.backend and parallel.threads parameters, where the first controlls the actual backend to use (“none” – the default, uses a single thread –, “multicore”, and several versions of snow: “snow”, “snow(SOCK)”, “snow(MPI)”, “snow(NWS)”) and the second the number of desired parallel threads (“auto” defaults to the reported number of cores for “multicore” or 2 otherwise, and to 2 for “snow”) or a more complex specification of the nodes for “snow” (see the snow package documentation for details and Appendix I: Distributing computations on remote Linux hosts). The implementation uses mclapply (in package parallel) and parLapply (package snow), is completely hidden from the user, and tries to pre-allocate whole chunks of patients to the CPUs/cores in order to reduce the various overheads (such as data transfer). In general, for “multicore” and “snow” with nodes on the local machine, do not use more than the number of physical cores in the system, and be mindful of the various overheads involved, meaning that the gains, while substantial especially for large databases, will be very slightly lower than the expected 1/#threads (as a corrolary, it might not be a good idea to paralellize very small datasets). Also, memory might be of concern when parallelizing, as at least parts of R’s environment will be replicated across threads/processes; this, in turn, for large environments and systems low on RAM, might result in massive performance loss due to swapping (or even result in crashes). For more information on parallelism in R please see, for example, CRAN Task View: High-Performance and Parallel Computing with R and the various blogposts and books on the matter.

Conceptually, we exploited various optimization techniques (see, for example, Hadley Wickham’s Advanced R and other blogposts on profiling and optimizing R code), but the two most important architectural decisions are to (a) extensively use data.table and (b) to pre-allocate chunks of participants for parallel processing. The general framework is to define a “workhorse” function that can process a set of participants and returns one data.frame or data.table (or several, in which case they must be encapsulated in a list()), workhorse function that is transparently called for the whole dataset (if parallel.backend is “none”), or in parallel for subsets of the whole dataset of roughly 1/parallel.threads size (for “multicore” and “snow”), in the latter case the results being transparently recombined (even if multiple results are returned in a list()). Internally, the workhorse functions tend to make extensive use of the data.table “reference semantics” (the := operator) to perform in-place changes and avoid unnecessary copying of objects, keys for fast indexing, search and selection, and the by grouping mechanism, allowing the application of a specialized function to each individual patient (or episode or sliding window, as needed). We decided to keep everything “pure R” (so there is so far no C++ code) and the code is extensively commented and hopefully clear to understand, change and extend.

Conclusions

‘AdhereR’ was developed to facilitate flexible and comprehensive analyses of adherence to medication from electronic healthcare data. All objects included in this package (‘compute.treatment.episodes’, ‘CMA1’ to ‘CMA9’, and their ‘CMA_per_episode’ and CMA_sliding_window versions) can be adapted to various research questions and designs, and we provided here only a few examples of the vast range of possibilities for use. Depending on the type of medication, study population, length of follow-up, etc., the various alternative parametrizations may lead to substantial differences or negligible variation. Very little evidence is available on the impact of these choices in specific scenarios. This package makes it easy to integrate such methodological investigations into data analysis plans, and to communicate these to the scientific community.

We have also aimed to facilitate replicability. Thus, summaries of functions include all parameter values and are easily printed for transparent reporting (for example in an appendix or a supplementary online material). The calculation of adherence values via ‘AdhereR’ can also be integrated in larger data analysis scripts and made available in a data repository for future use in similar studies, freely-available or with specific access rights. This allows other research teams to use the same parametrizations (for example if studying the same type of medication in different populations), and thus increase homogeneity of studies for the benefit of later meta-analytic efforts. If these parametrizations are complemented by justifications of each decision based on clinical and/or research evidence in specific clinical areas, they can be subject to discussion and clinical consensus building and thus represent transparent and easily-implementable guidelines for EHD-based adherence research in those areas. In this situation, comparisons across medications can also take into account any differences in analysis choices, and general rules derived for adherence calculation across domains.

References

Arnet I., Kooij M.J., Messerli M., Hersberger K.E., Heerdink E.R., Bouvy M. (2016) Proposal of Standardization to Assess Adherence With Medication Records Methodology Matters. The Annals of Pharmacotherapy 50(5):360–8. doi:10.1177/1060028016634106.

Gardarsdottir H., Souverein P.C., Egberts T.C.G., Heerdink E.R. (2010) Construction of drug treatment episodes from drug-dispensing histories is influenced by the gap length. J Clin Epidemiol. 63(4):422–7. doi:10.1016/j.jclinepi.2009.07.001.

Greevy R.A., Huizinga M.M., Roumie C.L., Grijalva C.G., Murff H., Liu X., Griffin, M.R. (2011). Comparisons of Persistence and Durability Among Three Oral Antidiabetic Therapies Using Electronic Prescription-Fill Data: The Impact of Adherence Requirements and Stockpiling. Clinical Pharmacology & Therapeutics 90(6):813–819. 10.1038/clpt.2011.228.

Peterson A.M., Nau D.P., Cramer J.A., Benner J., Gwadry-Sridhar F., Nichol M. (2007) A checklist for medication compliance and persistence studies using retrospective databases. Value in Health: Journal of the International Society for Pharmacoeconomics and Outcomes Research 10(1):3–12. doi:10.1111/j.1524-4733.2006.00139.x.

Souverein PC, Koster ES, Colice G, van Ganse E, Chisholm A, Price D, et al. (in press) Inhaled Corticosteroid Adherence Patterns in a Longitudinal Asthma Cohort. J Allergy Clin Immunol Pract. doi:10.1016/j.jaip.2016.09.022.

Vollmer W.M., Xu M., Feldstein A., Smith D., Waterbury A., Rand C. (2012) Comparison of pharmacy-based measures of medication adherence. BMC Health Services Research 12(1):155. doi:10.1186/1472-6963-12-155.

Vrijens B., De Geest S., Hughes D.A., Przemyslaw K., Demonceau J., Ruppar T., Dobbels F., Fargher E., Morrison V., Lewek P., Matyjaszczyk M., Mshelia C., Clyne W., Aronson J.K., Urquhart J.; ABC Project Team (2012) A new taxonomy for describing and defining adherence to medications. British Journal of Clinical Pharmacology 73(5):691–705. doi:10.1111/j.1365-2125.2012.04167.x.

Van Wijk B.L.G., Klungel O.H., Heerdink E.R., de Boer A. (2006). Refill persistence with chronic medication assessed from a pharmacy database was influenced by method of calculation. Journal of Clinical Epidemiology 59(1), 11–17. doi:10.1016/j.jclinepi.2005.05.005.

Sattler E., Lee J., Perri M. (2011). Medication (Re)fill Adherence Measures Derived from Pharmacy Claims Data in Older Americans: A Review of the Literature. Drugs & Aging 30(6), 383–99. doi:10.1007/s40266-013-0074-z.

Appendix I: Distributing computations on remote Linux hosts

For example, we show here how to compute CMA1 on a remote Linux machine from macOS and Windows 10 hosts.

The Linux machine (hostname workhorse) runs Ubuntu 16.04 (64 bits) with R 3.4.1 manually compiled and installed in /usr/local/lib64/R, and an OpenSSH server allowing access to the user user.

The macOS laptop is running macOS High Sierra 10.13.4, R 3.4.3, openssh (installed through homebrew) and we set up passwordless ssh access to workhorse (see, for example, the tutorial here). The Windows 10 desktop is running Microsoft Windows 10 Pro 1709 64 bits with openssh installed vias Cygwin (also with passwordless ssh access to workhorse) and Microsoft R Open 3.4.3.

All machines have the latest version of AdhereR and snow installed.

With these, we can, for example do:

cmaW3 <- CMA_sliding_window(CMA="CMA1",
                            data=med.events,
                            ID.colname="PATIENT_ID",
                            event.date.colname="DATE",
                            event.duration.colname="DURATION",
                            event.daily.dose.colname="PERDAY",
                            medication.class.colname="CATEGORY",
                            carry.only.for.same.medication=FALSE,
                            consider.dosage.change=FALSE,
                            sliding.window.duration=30,
                            sliding.window.step.duration=30,
                            parallel.backend="snow", # make clear we want to use snow
                            parallel.threads=c(rep(
                              list(list(host="user@workhorse", # host (and user)
                                        rscript="/usr/local/bin/Rscript", # Rscript location
                                        snowlib="/usr/local/lib64/R/library/") # snow package location
                                   ),
                              2))) # two remote threads

where we use parallel.backend="snow" and we specify the workhorse node(s) we want to use. Here, parallel.threads is a list of host specifications (one per thread), each a list contaning the host (possibly with the username for ssh access), rscript (the location on the remote host of Rscript) and snowlib (the location on the remote host of the snow package, usually the location where the R packages are installed). In this exmple, we create 2 such identical hosts (using rep(..., 2)) which measn that we will have two parallel threads running on workhorse. If everything is fine, the results should be returned to the user as usual.

NB1. This procedure was tested only on Linux hosts, but it should in principle also work with Windows hosts as well (but the setup is currently much more complex and apparently less reliable; moreover, in most high-performance production environments we expect Linux rather that Windows compute nodes).