PK(PD) dataset assembly with the apmx library

Prepare workspace and load data

This package contains randomly-generated source data for instructional purposes.


EX <-
PC <-
DM <-
LB <-


Clinical trial data is not collected in a way that automatically suits population pharmacometric work. Trial data is organized in a collection of datasets, one dataset per data type. These datasets are often called “domains”.

The FDA and other regulatory agencies require domains be formatted per CDISC standards for submission. There are two main types of CDISC datasets:

Here are some examples of common CDISC SDTM domains (as they relate to pharmacometrics):

There are many other types of SDTM domains. Technically, there are an infinite number of domains since you can create your own custom domains.

For every SDTM domain, there is usually an ADaM equivalent. All ADaM domains start with ad__, followed by the domain name:

There are some ADaM domains that are specific to the ADaM:

Even though this data is well organized, there is no CDISC format for use in NONMEM or other population pharmacometric softwares. That is why we have built an R package, apmx, to provide tools to help build population PK(PD) datasets.

This training will walk you through the R package and help you learn about pharmacometric data. The data loaded above are randomly-generated SDTM-like datasets to support training. They are based on a simple study design:

Currently, the package is limited to PK and PKPD datasets for analysis in NONMEM only. Additional tools for PK(PD) datasets, plus tools for other analysis types (TTE, logistic regression, QTC analysis) are under development and not available at this time. Datasets for analysis with other softwares, such as Monolix, are also unavailable at this time.

Dose event preparation

PK dataset assembly starts with preparing dose events. Dose events require several columns for assembly. Below are the apmx standard names, along with the typical SDTM name equivalent when applicable. Other variables, like DUR (infusion duration), may be required based on the analysis.

The analyst must confirm the ex domain contains all of this information for the package to work. This dataset contains all of the information we need except the compartment. CMT must always be programmed by the user based on the model design. In this case, CMT = 1 for the dose depot. We will also select only the columns that we need for the analysis, dropping the others.

ex <- EX %>%
  dplyr::mutate(CMT = 1) %>%

That’s all we have to do to prepare the dose events for assembly.

PK observation event preparation

Now, we are going to prepare the PK observations. Observation events require several columns for assembly:

The PC domain may have multiple DVIDs and CMTs, perhaps for multiple analytes. Once again, we need to confirm our dataset has all of this information. Are any variables missing?

pc <- PC %>%
  dplyr::filter(PCSTAT=="Y") %>%
  dplyr::mutate(CMT = 2,
                TPT = dplyr::case_when(PCTPT=="<1 hour Pre-dose" ~ 0,
                                       PCTPT=="30 minutes post-dose" ~ 0.5/24,
                                       PCTPT=="1 hour post-dose" ~ 1/24,
                                       PCTPT=="2 hours post-dose" ~ 2/24,
                                       PCTPT=="4 hours post-dose" ~ 4/24,
                                       PCTPT=="6 hours post-dose" ~ 6/24,
                                       PCTPT=="8 hours post-dose" ~ 8/24,
                                       PCTPT=="12 hours post-dose" ~ 12/24,
                                       PCTPT=="24 hours post-dose" ~ 24/24,
                                       PCTPT=="48 hours post-dose" ~ 48/24)) %>%

That’s all we have to do to prepare the observation events for assembly.

Simple dataset assembly

We have all of the information we need to build a simple PK dataset. Building a dataset is easy to do with apmx. Just feed the ex and pc domains into apmx::pk_build()!

df_simple <- apmx::pk_build(ex = ex, pc = pc)

This function does a lot! Let’s break down the new variables:

pk_build() has optional parameters that can customize the output dataset. Here are all of the options that will affect a simple dataset. Here they are presented in their default state:

df_simple <- apmx::pk_build(ex = ex, #dataframe of prepared dose events
                            pc = pc, #dataframe of prepared pc observation events
                            time.units = "days", #can be set to days or hours.
                            #NOTE: units of TPT in ex and pc should match this unit
                            cycle.length = NA, #must be in units of days, will reset NTLC to 0
                            na = -999, #replaces missing nominal times and covariates with a numeric value
                            time.rnd = NULL, #rounds all time values to x decimal places
                            amt.rnd = NULL, #rounds calculated dose values to x decimal places
                            dv.rnd = NULL, #rounds observation columns to x decimal places
                            impute = NA, #imputation method for missing times
                            sparse = 3) #threshold for calculating sparse/serial distinctions

I recommend setting time.rnd = 3 to make the dataset easier to read.

df_simple <- apmx::pk_build(ex, pc, time.rnd = 3)

Sometimes, you will want a more complicated dataset. Let’s explore additional functionalities of pk_build().

Covariate preparation

For the most part, all covariates can be divided into four categories:

apmx has a few requirements to help keep track of different kinds of covariates. When you program covariates, you have to follow these rules:

Let’s start by preparing some subject-level covariates from dm and lb. All subject-level covariate data frames require a USUBJID column. There must only be one row per subject. Covariate names should be clear and easy to interpret.

dm <- DM %>%
  dplyr::select(USUBJID, AGE, SEX, RACE, ETHNIC) %>%
  dplyr::mutate(AGEU = "years") #AGE is continuous and requires a unit
lb <- LB %>% #select the desired labs
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBVST %in% c("Baseline (D1)", "Screening")) %>%
  dplyr::filter(LBPARAMCD %in% c("ALB", "AST", "ALT", "BILI", "CREAT")) %>%
  dplyr::mutate(LBORRES = as.numeric(LBORRES))

lb <- lb %>% #select the lab collected immediately prior to first dose
  dplyr::arrange(USUBJID, LBPARAMCD, LBDT) %>%
  dplyr::group_by(USUBJID, LBPARAMCD) %>%
  dplyr::filter(row_number()==max(row_number())) %>%

lb <- lb %>% #finish formatting and add units since all labs are continuous
  dplyr::select(USUBJID, LBPARAMCD, LBORRES) %>%
  tidyr::pivot_wider(names_from = "LBPARAMCD", values_from = "LBORRES") %>%
  dplyr::mutate(ALBU = "g/dL",
                ASTU = "IU/L",
                ALTU = "IU/L",
                BILIU = "mg/dL",
                CREATU = "mg/dL")

Next, let’s prepare some time-varying covariates from lb. All time-varying covariate data frames require a USUBJID and DTIM column.

tast <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAMCD=="AST") %>%
  dplyr::mutate(LBORRES = as.numeric(LBORRES)) %>%
  dplyr::select(USUBJID, DTIM = LBDT, AST = LBORRES) %>%
  dplyr::mutate(ASTU = "IU/L")
talt <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAMCD=="ALT") %>%
  dplyr::mutate(LBORRES = as.numeric(LBORRES)) %>%
  dplyr::select(USUBJID, DTIM = LBDT, ALT = LBORRES) %>%
  dplyr::mutate(ALTU = "IU/L")

PD observation preparation

You may want to add PD observations to your dataset. PD observations have the same requirements as pc observations. Unfortunately, apmx does not recognize SDTM/ADaM language for PD observations. That is because there are many types of pd events, with many types of possible formats. You must convert all column names to apmx column names.

For this analysis, we will pretend glucose observations from lb are a meaningful biomarker. Let’s set CMT = 3 for the PD compartment.

pd <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAM=="glucose") %>%
  dplyr::mutate(DTIM = paste(LBDT, "00:00"),
                VISIT = LBVST,
                NDAY = case_when(VISIT=="Screening" ~ -15,
                                 VISIT=="Baseline (D1)" ~ 1,
                                 VISIT=="Visit 2 (D8)" ~ 8,
                                 VISIT=="Visit 3 (D15)" ~ 15,
                                 VISIT=="Visit 4 (D29)" ~ 29,
                                 VISIT=="End of Treatment" ~ 45),
                TPT = 0,
                TPTC = LBTPT,
                ODV = as.numeric(LBORRES),
                DVIDU = LBORRESU,
                LLOQ = NA,
                CMT = 3,
                DVID = LBPARAM) %>%
  dplyr::select(USUBJID, DTIM, NDAY, VISIT, TPT,
                ODV, LLOQ, CMT, DVID, TPTC, DVIDU)

Full dataset assembly

Let’s add all of the new events and covariates to the dataset.

df_full <- apmx::pk_build(ex = ex, pc = pc, pd = pd,
                          sl.cov = list(dm, lb),
                          tv.cov = list(tast, talt),
                          time.rnd = 3)
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005

First, you’ll notice a warning was issued in the console. We will re-visit the warnings later in this document. Instead, let’s focus on the dataset itself.

There is a new type of row where EVID = 2.

#> [1] 0 1
#> [1] 2 0 1

These rows capture the date-time and values of time-varying covariates. Sometimes, we want to retain the exact date-time of each time-varying covariate.

The DVID column changed since the last visit.

#> [1]  1 NA
#> [1] NA  2  1

#> [1] "ABC999"
#> [1] NA        "glucose" "ABC999"

There are now two observation events, ABC999 and glucose. The NA rows are for dose and other events.

You’ll notice that all of the covariate names changed a bit. They all received a prefix, and some received a suffix. Why do we do this? Prefixes and suffixes can identify the type of covariate:

If you can’t remember the prefixes and suffixes, that’s OK! We have an additional function to help with that. apmx::cov_find() will return all covariates of particular types in a PK dataset.

apmx::cov_find(df_full, cov = "categorical", type = "numeric")
#> [1] "NSTUDY"  "NROUTE"  "NFRQ"    "NSEX"    "NRACE"   "NETHNIC"
apmx::cov_find(df_full, cov = "categorical", type = "character")
apmx::cov_find(df_full, cov = "continuous", type = "numeric")
#> [1] "BAGE"   "BALB"   "BALT"   "BAST"   "BBILI"  "BCREAT" "TAST"   "TALT"
apmx::cov_find(df_full, cov = "units", type = "character")
#> [1] "BAGEU"   "BALBU"   "BALTU"   "BASTU"   "BBILIU"  "BCREATU" "TASTU"  
#> [8] "TALTU"

Let’s explore the rest of the optional parameters in pk_build().

df_full <- apmx::pk_build(ex = ex, pc = pc, pd = pd,
                          sl.cov = list(dm, lb),
                          tv.cov = list(tast, talt),
                          time.rnd = 3,
                          cov.rnd = NULL, #rounds observation columns to x decimal places
                          BDV = FALSE, #calculates baseline dependent variable for PD events
                          DDV = FALSE, #calculates change (delta) from baseline for PD events
                          PDV = FALSE, #calculates percent change from baseline for PD events
                 = TRUE, #adds specific numeric mapping for SEX, RACE, and ETHNIC variables
                          tv.cov.fill = "downup", #fill pattern for time-varying covariates
                          keep.other = TRUE) #keep or drop all EVID = 2 rows
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005

The dataset is a bit easier to read if we drop the other events. We will do that moving forward for the rest of the tutorial.

df_full <- apmx::pk_build(ex = ex, pc = pc, pd = pd,
                          sl.cov = list(dm, lb),
                          tv.cov = list(tast, talt),
                          time.rnd = 3, dv.rnd = 3,
                          BDV = TRUE, DDV = TRUE, PDV = TRUE,
                          keep.other = FALSE)
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005

Other covariate methods

Time-varying covariates can be challenging to work with. The pk_build() function can only fill them by date-time. What if date-time is not available in the source data?

The apmx::cov_apply() function will add covariates to a dataset built by pk_build(). It will add time-varying covariates by any time variable, including:

Let’s add TAST (time-varying AST) by nominal time instead of actual time.

tast <- LB %>%
  dplyr::filter(LBCOMPFL=="Y") %>%
  dplyr::filter(LBPARAMCD=="AST") %>%
  dplyr::mutate(NTFD = case_when(LBVST=="Screening" ~ -15, #calculate NTFD from visit code
                                 LBVST=="Baseline (D1)" ~ 1,
                                 LBVST=="Visit 2 (D8)" ~ 8,
                                 LBVST=="Visit 3 (D15)" ~ 15,
                                 LBVST=="Visit 4 (D29)" ~ 29,
                                 LBVST=="End of Treatment" ~ 45)) %>%
  dplyr::mutate(AST = as.numeric(LBORRES)) %>%
  dplyr::select(USUBJID, NTFD, AST, ASTU = LBORRESU)

df_cov_apply <- apmx::pk_build(ex = ex, pc = pc,
                               sl.cov = list(dm, lb),
                               time.rnd = 3, dv.rnd = 3,
                               BDV = TRUE, DDV = TRUE, PDV = TRUE,
                               keep.other = FALSE) %>%
  apmx::cov_apply(tast, = "NTFD")

cov_apply() can also add subject-level covariates by any subject identifier.

df_cov_apply <- apmx::pk_build(ex = ex, pc = pc,
                               time.rnd = 3, dv.rnd = 3,
                               BDV = TRUE, DDV = TRUE, PDV = TRUE,
                               keep.other = FALSE) %>%
  apmx::cov_apply(dm) %>%
  apmx::cov_apply(lb) %>%
  apmx::cov_apply(talt, = "DTIM") %>%
  apmx::cov_apply(tast, = "NTFD")

cov_apply() can also add empirical bayes estimates or exposure metrics. Notice these also get their own prefixes.

cov_apply() cannot handle units for these parameters at this time.

Let’s try adding exposure metrics and parameter estimates to the dataset. First, we will generate dummy exposures and parameter estimates.

exposure <- data.frame(ID = 1:22, #exposure metrics
                       MAX = 1001:1022,
                       MIN = 101:122,
                       AVG = 501:522)

parameters <- data.frame(ID = 1:22, #individual clearance and central volume estimates
                         CL = seq(0.1, 2.2, 0.1),
                         VC = seq(1, 11.5, 0.5))
df_cov_apply <- apmx::pk_build(ex = ex, pc = pc,
                               time.rnd = 3, dv.rnd = 3,
                               BDV = TRUE, DDV = TRUE, PDV = TRUE,
                               keep.other = FALSE) %>%
  apmx::cov_apply(dm) %>%
  apmx::cov_apply(lb) %>%
  apmx::cov_apply(talt, = "DTIM", keep.other = FALSE) %>%
  apmx::cov_apply(tast, = "NTFD", keep.other = FALSE) %>%
  apmx::cov_apply(exposure, = "ID", exp = TRUE) %>%
  apmx::cov_apply(parameters, = "ID", ebe = TRUE)

It is recommended you always use pk_build() or cov_apply() to add covariates instead of adding them in yourself. That ensures cov_find() always finds the covariates correctly.

apmx::cov_find(df_cov_apply, cov = "categorical", type = "numeric")
#> [1] "NSTUDY"  "NROUTE"  "NFRQ"    "NSEX"    "NRACE"   "NETHNIC"
apmx::cov_find(df_cov_apply, cov = "categorical", type = "character")
apmx::cov_find(df_cov_apply, cov = "continuous", type = "numeric")
#> [1] "BAGE"   "BALB"   "BALT"   "BAST"   "BBILI"  "BCREAT" "TALT"   "TAST"
apmx::cov_find(df_cov_apply, cov = "units", type = "character")
#> [1] "BAGEU"   "BALBU"   "BALTU"   "BASTU"   "BBILIU"  "BCREATU" "TALTU"  
#> [8] "TASTU"
apmx::cov_find(df_cov_apply, cov = "exposure", type = "numeric")
#> [1] "CMAX" "CMIN" "CAVG"
apmx::cov_find(df_cov_apply, cov = "empirical bayes estimate", type = "numeric")
#> [1] "ICL" "IVC"

Errors and warnings

pk_build() and other apmx functions issue errors/warnings for problematic data. What is the warning we have been receiving this whole time? First, let’s filter our dataset to the one subject triggering the warning:

warning <- df_full %>%

#> [1] 1
#> [1] "glucose"

This subject has 1 PD observation, no dose or PK observations. Because there is no dose, you cannot calculate ATFD (actual time since first dose). The warning informs you which subjects have this particular problem. This helps you diagnose potential problems with your data. Notice in this instance, the record is flagged by C and TIMEF.

#> [1] "C"
#> [1] 1

There are other errors and warnings to help you diagnose your data as well. There is a key difference between the two:


What if you are missing a required column in your input domain?

ex_error <- ex[, -5]

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): Column NDAY is missing from the ex dataset.

What if the variable types are incorrect?

ex_error <- ex
ex_error$USUBJID <- 1:42

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): Column USUBJID in ex is not character type.

What if a required value is missing?

ex_error <- ex
ex_error$USUBJID[5] <- NA

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): USUBJID missing in ex for at least 1 row.

What if we program ADDL but not II for dose events?

ex_error <- ex
ex_error$ADDL <- 1

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): If ex contains ADDL, it must contain II

What if date-time is not formatted correctly?

ex_error <- ex
ex_error$EXSTDTC <- substr(ex_error$EXSTDTC, 1, 10)

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): DTIM in ex is not ISO 8601 format.

What if the baseline nominal day NDAY == 0 instead of 1?

ex_error <- ex
ex_error$EXSTDY <- 0

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): NDAY in ex has a 0 measurement. Please confirm day of first dose is nominal day 1 and the day prior to first dose is nominal day -1.

Nominal days can be tricky. The day a patient takes their first dose is day 1. The day before their first dose is day -1. Therefore, there is no study day 0.

What if ADDL and II are both present, but one of them is NA?

ex_error <- ex
ex_error$ADDL <- 1
ex_error$II <- c(rep(1, 41), NA)

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): At least one row in ex has a documented ADDL when II is NA.

What if you only enter a dose domain?

#> Error in apmx::pk_build(ex): Please enter a pc or pd domain.

What if a pc observation is 0 or negative?

pc_error <- pc
pc_error$PCSTRESN[10] <- 0

apmx::pk_build(ex, pc_error)
#> Error in apmx::pk_build(ex, pc_error): At least one dependent variable in PC is less than or equal to 0.

What if the study code is not included in ex or sl.cov? Note that you can pass the study code variable through sl.cov or ex.

ex_error <- ex %>%

apmx::pk_build(ex_error, pc)
#> Error in apmx::pk_build(ex_error, pc): STUDY column must be included in ex or sl.cov.

What if you have multiple values for a subject-level covariate within one subject?

dm_error <- dm
dm_error$USUBJID[2] <- "ABC102-01-001"

apmx::pk_build(ex, pc, sl.cov=dm_error)
#> Error in apmx::pk_build(ex, pc, sl.cov = dm_error): sl.cov has duplicate USUBJID rows.

What if you select a time unit not supported by pk_build?

apmx::pk_build(ex, pc, time.units="minutes")
#> Error in apmx::pk_build(ex, pc, time.units = "minutes"): time.units parameter must be in days or hours.

What if you program DDV and/or PDV without calculating BDV?

apmx::pk_build(ex, pc, pd, DDV=TRUE, PDV==TRUE)
#> Error in apmx::pk_build(ex, pc, pd, DDV = TRUE, PDV == TRUE): object 'PDV' not found

What if you pass the same covariate through multiple dataframes?

ex_error <- ex
ex_error$NSEX <- 0

apmx::pk_build(ex_error, pc, sl.cov = dm)
#> Error in apmx::pk_build(ex_error, pc, sl.cov = dm): NSEX column is duplicated in sl.cov and another dataset. Please include this column in one dataset only.

Note you are allowed to pass other columns through the ex, pc, and pd domains. For example, try adding the column SEX instead of NSEX. If you pass an extra column through ex, pc, or pd, it will not be impacted by the function.

What if you provide a continuous covariate but forget to provide units?

dm_error <- dm %>%

apmx::pk_build(ex, pc, sl.cov = dm_error)
#> Error in apmx::pk_build(ex, pc, sl.cov = dm_error): All numerical covariates in sl.cov need units.


These datasets will build, but pk_build() will inform you of potential problems. What if a subject has no covariates, but others do?

dm_warning <- dm
dm_warning <- dm_warning[1:4,]

df_warning <- apmx::pk_build(ex, pc, sl.cov=dm_warning)
#> Warning in apmx::pk_build(ex, pc, sl.cov = dm_warning): The following
#> USUBJID(s) have PKPD events but are not in sl.cov: ABC102-01-006,
#> ABC102-02-001, ABC102-02-002, ABC102-02-003, ABC102-02-004, ABC102-03-001,
#> ABC102-03-002, ABC102-03-003, ABC102-03-004, ABC102-04-001, ABC102-04-002,
#> ABC102-04-003, ABC102-04-004, ABC102-04-005, ABC102-04-006, ABC102-04-007,
#> ABC102-04-008
df_warning <- apmx::pk_build(ex, pc, sl.cov = list(dm_warning, lb))

Notice the warning is only triggered if a subject has NO covariates. In the second case, all subjects are included in lb, while only some are in dm. The warning does not issue if the subject has at least 1 covariate. All missing covariate are filled with the missing parameter, default -999.

What if a subject does not have any baseline PD events and BDV|DDV|PDV == TRUE? Notice the warning is only issued if BDV, DDV, or PDV are calculated.

pd_warning <- pd
pd_warning <- pd[3:nrow(pd_warning), ]

df_warning <- apmx::pk_build(ex, pc, pd_warning, BDV=TRUE)
#> Warning in apmx::pk_build(ex, pc, pd_warning, BDV = TRUE): The following
#> USUBJID(s) do not have a baseline glucose observation at or prior to first dose
#> (BDV, DDV, PDV not calculated): ABC102-01-001
#> Warning in apmx::pk_build(ex, pc, pd_warning, BDV = TRUE): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-005
df_warning <- apmx::pk_build(ex, pc, pd_warning)
#> Warning in apmx::pk_build(ex, pc, pd_warning): The following USUBJID(s) have at
#> least one event with missing ATFD: ABC102-01-005

What if the source data events occurred out of order? You’ll notice the NTFD of the first observation falls after the next event.

pc_warning <- pc
pc_warning$TPT[1] <- 0.07

df_warning <- apmx::pk_build(ex, pc_warning,
                             time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_warning, time.rnd = 3): The following
#> USUBJID(s) have at least one event that occurred out of protocol order (NTFD is
#> not strictly increasing): ABC102-01-001

What if a dose event is missing AMT? The record is automatically C-flagged and a warning is issued. Note that the PK records for this subject are not C-flagged.

ex_warning <- ex
ex_warning$EXDOSE[1] <- NA

df_warning <- apmx::pk_build(ex_warning, pc,
                             time.rnd = 3)
#> Warning in apmx::pk_build(ex_warning, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one dose event with missing AMT: ABC102-01-001

What if there are two events that occur at the same time? Notice how the duplicated events are C-flagged and a warning is issued.

pc_warning <- pc
pc_warning[2, ] <- pc_warning[1, ]
pc_warning$PCSTRESN[2] <- 1400

df_warning <- apmx::pk_build(ex, pc_warning,
                             time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_warning, time.rnd = 3): The following
#> USUBJID(s) have at least one duplicate event: ABC102-01-001

What if you have a long column names? This warning informs you some column names are longer than 8 characters. This will prevent you from converting the dataset to a .xpt file if desired.

dm_warning <- dm %>%

df_warning <- apmx::pk_build(ex, pc, sl.cov = dm_warning)
#> Warning in apmx::pk_build(ex, pc, sl.cov = dm_warning): The following column
#> name(s) are longer than 8 characters: NETHNICITY, NETHNICITYC

What if your baseline covariates and time-varying covariates are not equivalent at baseline? In theory, all baseline covariates and time-varying covarites should agree at NTFD == 0.

lb_warning <- lb
lb_warning$ALT[1] <- 31

df_warning <- apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt)
#> Warning in apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt): The
#> following USUBJID(s) have at least one event with missing ATFD: ABC102-01-005
#> Warning in apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt): BALT and
#> TALT are not equivalent at first dose (baseline).

Time imputations

Some of our errors and warnings discuss problems with date/time elements of ex and pc. What do you do when you have an event, but the date/time information is missing? pk_build provides two methods for imputing missing times:

Let’s experiment with these two methods. First, we will drop some date-times from pc and replace them with NA.

pc_impute <- pc
pc_impute$PCDTC[c(4, 39, 73, 128)] <- NA

df_impute <- apmx::pk_build(ex, pc_impute,
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_impute, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001,
#> ABC102-01-002, ABC102-01-004, ABC102-02-002

This triggers the warning for missing ATFD as expected. Now, let’s try impute method 1.

df_impute_1 <- apmx::pk_build(ex, pc_impute,
                              time.rnd = 3, impute = 1)
#> Warning in apmx::pk_build(ex, pc_impute, time.rnd = 3, impute = 1): The
#> following USUBJID(s) have at least one event that occurred out of protocol
#> order (NTFD is not strictly increasing): ABC102-01-004

First, notice we have a new warning. We’ll come back to that later. You should also notice that all events have times and the time warning disappeared. The imputation is notated with the IMPEX and IMPDV columns.

nrow(df_impute_1[$ATFD),]) #number of rows with missing ATFD
#> [1] 0

imputed_events_1 <- df_impute_1 %>%
  dplyr::filter(IMPDV==1 | IMPEX==1)

IMPDV will flag observation records with an imputed time. IMPEX will flag all records impacted by an imputed dose. You’ll notice we still have a warning for one subject. Let’s find out why.

times_check_1 <- df_impute_1 %>%

Notice row 12 has an imputed time ATFD = 14.042. That is because NTFD = 14.042 for that record. However, the dose for this visit was administered a few days late, at time ATFD = 16.053. This imputation puts the post-dose sample two days ahead of the dose. Impute method 1 a poor assumption for this missing date.

Let’s try method 2 to see if that assumption is better. Method 2 takes the late dose into account by estimating the time of the sample relative to the other events that day.

df_impute_2 <- apmx::pk_build(ex, pc_impute,
                              time.rnd = 3, impute = 2)

imputed_events_2 <- df_impute_2 %>%
  dplyr::filter(IMPDV==1 | IMPEX==1)

You’ll notice the warning disappears. Let’s check that subject again.

times_check_2 <- df_impute_2 %>%

You’ll notice that under this method, when NTFD = 14.042, ATFD = 16.094. Why?

What if we are missing a date/time for a dose event? Let’s repeat the experiment.

ex_impute <- ex
ex_impute$EXSTDTC[2] <- NA

df_impute <- apmx::pk_build(ex_impute, pc, #no imputation method
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001
df_impute_1 <- apmx::pk_build(ex_impute, pc, #imputation method 1
                              time.rnd = 3, impute = 1)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3, impute = 1): The
#> following USUBJID(s) have at least one event that occurred out of protocol
#> order (NTFD is not strictly increasing): ABC102-01-001

imputed_events_1 <- df_impute_1 %>% #imputed records
  dplyr::filter(IMPDV==1 | IMPEX==1)

Now, a lot of records for subject 1 have IMPEX == 1. This is because all of these observations are associated with a dose with an imputed time. Is method 1 a good assumption?

Let’s try method 2 to see the difference. You’ll notice the events are in the correct order and times are imputed successfully.

df_impute_2 <- apmx::pk_build(ex_impute, pc,
                              time.rnd = 3, impute = 2)

imputed_events_2 <- df_impute_2 %>%
  dplyr::filter(IMPDV==1 | IMPEX==1)

What if the first dose is missing instead of the second dose? Let’s repeat the experiment, this time with method 2 only since we can assume method 1 won’t work well in this scenario.

ex_impute <- ex
ex_impute$EXSTDTC[1] <- NA

df_impute <- apmx::pk_build(ex_impute, pc, # No imputation method, expect a warning
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001
df_impute_2 <- apmx::pk_build(ex_impute, pc, #imputation method 2
                              time.rnd = 3, impute = 2)

imputed_events_2 <- df_impute_2 %>% #imputed events
  dplyr::filter(IMPDV==1 | IMPEX==1 | IMPFEX==1)

Notice an extra column was created, IMPFEX.

One final experiment - what if we are missing date-times from ex and pc? Note all times are imputed successfully and all warnings disappear.

ex_impute <- ex
ex_impute$EXSTDTC[1:2] <- NA

df_impute <- apmx::pk_build(ex = ex_impute, pc = pc_impute, #no impuation method
                            time.rnd = 3)
#> Warning in apmx::pk_build(ex = ex_impute, pc = pc_impute, time.rnd = 3): The
#> following USUBJID(s) have at least one event with missing ATFD: ABC102-01-001,
#> ABC102-01-002, ABC102-01-004, ABC102-02-002
df_impute_2 <- apmx::pk_build(ex = ex_impute, pc = pc_impute, #imputation method 2
                              time.rnd = 3, impute = 2)

Dataset combination

What if we have multiple studies we want to analyze at once? We could create one large ex, pc, etc. input with each study, or we could use apmx::pk_combine() to combine two datasets built by pk_build().

Let’s create a copy of df_full and change it slightly. We’ll pretend it’s built from a second study, ABC103.

df_full2 <- df_full %>%
  dplyr::filter(DOMAIN!="PD") %>% #remove glucose observations
  dplyr::filter(ID<19) %>% #remove subject 19
  dplyr::group_by(ID) %>%
  dplyr::mutate(NSTUDYC = "ABC103", #update study ID
                USUBJID = gsub("ABC102", "ABC103", USUBJID),
                BAGE = round(rnorm(1, 45, 10)), #re-create all continuous covariates
                BALB = round(rnorm(1, 4, 0.5), 1),
                BALT = round(rnorm(1, 30, 5)),
                BAST = round(rnorm(1, 33, 5)),
                BBILI = round(rnorm(1, 0.7, 0.2), 3),
                BCREAT = round(rnorm(1, 0.85, 0.2), 3),
                TAST = ifelse(NTFD==0, BAST, round(rnorm(1, 33, 5))),
                TALT = ifelse(NTFD==0, BALT, round(rnorm(1, 30, 5)))) %>%

Now, we can combine these two studies together.

df_combine <- apmx::pk_combine(df_full, df_full2)
#> Warning in apmx::pk_combine(df_full, df_full2): Datasets have different number
#> of DVIDs.
#> Warning in apmx::pk_combine(df_full, df_full2): CMT = 3 not included in df2

You’ll notice we have a few more warnings issued with this function. That is because our DVID assignments are different.

#> [1]  2  1 NA
#> [1]  1 NA

If you forgot to add pd events for study 2, this warning will remind you. For thits tutorial, we will continue to exclude them.

Once we are done creating our dataset, we can read it out with the function apmx::pk_write(). This ensures the dataset is read out in a NONMEM-usable format.

name <- "PK_ABC101_V01.csv"
apmx::pk_write(df_combine, file.path(tempdir(), name))

Dataset documentation

Documenting a dataset is important when working with a team and when sharing work with outside organizations or regulatory agencies. For example, the FDA requires all population pharmacometric analysis datasets be accompanied with a definition file. apmx provides tools to help you document your dataset.

We will start by exploring the definition file feature. The definition file sources variable names from a dataframe of definitions created with apmx::variable_list_create(). It comes pre-filled with definitions for standard apmx variables, and gives you the ability to add your own for covariates and other custom variables. NOTE you do not have to add prefixes and suffixes to this list, just the root term of each covariate (SEX instead of NSEX and NSEXC).

vl <- apmx::variable_list_create(variable = c("SEX", "RACE", "ETHNIC", "AGE",
                                              "ALB", "ALT", "AST", "BILI", "CREAT"),
                           categorization = rep("Covariate", 9),
                           description = c("sex", "race", "ethnicity", "age",
                                           "albumin", "alanine aminotransferase",
                                           "aspartate aminotransferase",
                                           "total bilirubin", "serum creatinine"))

Now, let’s create the definition file.

define <- apmx::pk_define(df = df_combine,

You can export the definition file to a word document using the file argument. The project and data parameters can be used to add a custom project name and dataset name to the header of the document. To use this feature, you must use a Word document template with the words “Project” and “Dataset” in the header. You can provide the template of the Word document with the template parameter.

define <- apmx::pk_define(df = df_combine,
                          file = file.path(tempdir(), "definition_file.docx"),
                          project = "Sponsor Name",
                          data = "Dataset Name")

Next, let’s create a version log. Version logs are important when we have multiple datasets over a project duration. Datasets can be updated for all sorts of reasons:

Similar to the definition function, we can provide a template for formatting. You can also provide a comment to describe the source data. The version log is easiest to use when you read it out as a word document using the file parameter.

vrlg <- apmx::version_log(df = df_combine,
                          name = name,
                          file = file.path(tempdir(), "version_log.docx"),
                          src_data = "original test data")

Open the version log document and take a look around. Notice that there is a column called “Comments”. You can add a comment there in the Word document, and the function will not overwrite it. When you produce a new dataset, call apmx::version_log() again with the new dataset, the most recent dataset, the new dataset name, and the same filepath as the previous log. You will need to use comp_var to group the rows for comparison. For PKPD datasets, we recommend grouping by USUBJID, ATFD, EVID, and DVID. This function will update the version log by adding a new row to the Word document.

Lastly, apmx can help you produce summary tables of your datasets. apmx::pk_summarize() produces three types of summary tables:

Tables can be stratified by any other categorical covariate in the dataset.

sum1 <- apmx::pk_summarize(df = df_combine)

The summary function has other parameters to help you document the dataset:

sum2 <- apmx::pk_summarize(df = df_combine,
                  = c("NSTUDYC", "NSEXC"),
                           ignore.request = "NRACE == 2")