This package contains randomly-generated source data for instructional purposes.
library(apmx)
library(dplyr)
library(tidyr)
<- as.data.frame(EX)
EX <- as.data.frame(PC)
PC <- as.data.frame(DM)
DM <- as.data.frame(LB) LB
Clinical trial data is not collected in a way that automatically suits population pharmacometric work. Trial data is organized in a collection of datasets, one dataset per data type. These datasets are often called “domains”.
The FDA and other regulatory agencies require domains be formatted per CDISC standards for submission. There are two main types of CDISC datasets:
Here are some examples of common CDISC SDTM domains (as they relate to pharmacometrics):
ex
: exposure (data about administered and planned
doses)pc
: pharmacokinetics (data about pharmacokinetic
samples)dm
: demographics (general metadata about the
subject)lb
: laboratory (chemistry, hematology, lipid, and other
lab panel results)vs
: vital signs (height, weight, BMI, and other
clinical tests)cm
: conconmitant medications (additional medications
taken prior to, during, and/or after treatment)ae
: adverse events (any untoward medical event that
occurs after signing informed consent while on trial)eg
: EKG (ECG) readingstr
: tumor response (RECIST 1.1 or other tumor
measurements)rs
: response (other response measurements, such as OS,
PFS, etc.)There are many other types of SDTM domains. Technically, there are an infinite number of domains since you can create your own custom domains.
For every SDTM domain, there is usually an ADaM equivalent. All ADaM domains start with ad__, followed by the domain name:
adex
: ADaM version of exThere are some ADaM domains that are specific to the ADaM:
adsl
: subject-level (a compilation of many important
variables, one row per subject)Even though this data is well organized, there is no CDISC format for use in NONMEM or other population pharmacometric softwares. That is why we have built an R package, apmx, to provide tools to help build population PK(PD) datasets.
This training will walk you through the R package and help you learn about pharmacometric data. The data loaded above are randomly-generated SDTM-like datasets to support training. They are based on a simple study design:
Currently, the package is limited to PK and PKPD datasets for analysis in NONMEM only. Additional tools for PK(PD) datasets, plus tools for other analysis types (TTE, logistic regression, QTC analysis) are under development and not available at this time. Datasets for analysis with other softwares, such as Monolix, are also unavailable at this time.
PK dataset assembly starts with preparing dose events. Dose events
require several columns for assembly. Below are the apmx standard names,
along with the typical SDTM name equivalent when applicable. Other
variables, like DUR
(infusion duration), may be required
based on the analysis.
USUBJID
: subject ID [character]DTIM
(EXSTDTC): date-time of dose administration
[character]VISIT
: character visit label [character]NDAY
(EXSTDY): study day [numeric]TPTC
(EXTPT): dose timepoint label [character]TPT
(EXTPTNUM): dose timepoint [numeric]CMT
: assigned compartment for dose events
[numeric]AMT
(EXDOSE): amount of drug administered
[numeric]DVID
(EXTRT): dose event label [character]ROUTE
(EXROUTE): route of administration
[character]FRQ
(EXDOSFRQ): dose frequency [character]DVIDU
(EXDOSU): dose units [character]The analyst must confirm the ex domain contains all of this
information for the package to work. This dataset contains all of the
information we need except the compartment. CMT
must always
be programmed by the user based on the model design. In this case,
CMT = 1
for the dose depot. We will also select only the
columns that we need for the analysis, dropping the others.
<- EX %>%
ex ::mutate(CMT = 1) %>%
dplyr::select(USUBJID, STUDYID, EXSTDTC, VISIT, EXSTDY, EXTPTNUM, EXDOSE,
dplyr CMT, EXTRT, EXTPT, EXROUTE, EXDOSFRQ, EXDOSU)
That’s all we have to do to prepare the dose events for assembly.
Now, we are going to prepare the PK observations. Observation events require several columns for assembly:
USUBJID
: subject ID [character]DTIM
(PCDTC): date-time of observation [character]VISIT
: character visit label [character]NDAY
: study day [numeric]TPTC
(PCTPT): observation timepoint label
[character]TPT
: observation timepoint [numeric]CMT
: assigned compartment for observation events
[numeric]ODV
(PCSTRESN): observation value in original units
[numeric]LLOQ
(PCLLOQ): observation lower limit of
quantification [numeric]DVID
(PCTEST): observation label [character]DVIDU
(PCTESTU): observation units [character]The PC domain may have multiple DVIDs and CMTs, perhaps for multiple analytes. Once again, we need to confirm our dataset has all of this information. Are any variables missing?
CMT = 2
for central compartment<- PC %>%
pc ::filter(PCSTAT=="Y") %>%
dplyr::mutate(CMT = 2,
dplyrTPT = dplyr::case_when(PCTPT=="<1 hour Pre-dose" ~ 0,
=="30 minutes post-dose" ~ 0.5/24,
PCTPT=="1 hour post-dose" ~ 1/24,
PCTPT=="2 hours post-dose" ~ 2/24,
PCTPT=="4 hours post-dose" ~ 4/24,
PCTPT=="6 hours post-dose" ~ 6/24,
PCTPT=="8 hours post-dose" ~ 8/24,
PCTPT=="12 hours post-dose" ~ 12/24,
PCTPT=="24 hours post-dose" ~ 24/24,
PCTPT=="48 hours post-dose" ~ 48/24)) %>%
PCTPT::select(USUBJID, PCDTC, PCDY, VISIT, TPT, PCSTRESN,
dplyr PCLLOQ, CMT, PCTEST, PCTPT, PCSTRESU)
That’s all we have to do to prepare the observation events for assembly.
We have all of the information we need to build a simple PK dataset.
Building a dataset is easy to do with apmx
. Just feed the
ex and pc domains into apmx::pk_build()
!
<- apmx::pk_build(ex = ex, pc = pc) df_simple
This function does a lot! Let’s break down the new variables:
C
: this flag comments out problematic records flagged
by PDOSEF, TIMEF, AMTF, or DUPFNSTUDY
: numeric version of STUDYID
SUBJID
: numeric version of USUBJID
ID
: numeric version of USUBJID
(counting
from 1)ATFD
: actual time since first doseATLD
: actual time since last doseNTFD
: nominal time since first doseNTLC
: nominal time since last cycleNTLD
: nominal time since last doseEVID
: event ID (NONMEM-required)MDV
: missing dependent variable (NONMEM-required)DVID
: numeric version of DVID
LDV
: log-transformed ODV
BLQ
: below-limit of quantification flagDOSENUM
: dose number (counting from 1)DOSEA
: most recent administered dose amountNROUTE
: numeric version of ROUTE
NFRQ
: numeric version of FR
QPDOSEF
: flag for records that occur prior to the first
doseTIMEF
: flag for records where
ATFD = NA
AMTF
: flag for dose events where
AMT = NA
DUPF
: flag for duplicated records (same
USUBJID
, ATFD
, EVID
, and
CMT
)NOEXF
: flag for subjects with no dose eventsNODV1F
: flag for subjects with no observations where
DVID = 1
SDF
: flag for single-dose subjectsPLBOF
: flag for placebo recordsSPARSEF
: flag for records associating with sparse
samplingTREXF
: flag for dose records occurring after the last
observationIMPEX
: flag for records impacted by a dose event with
imputed timeIMPDV
: flag for an observation record with an imputed
timeLINE
: dataset row numberNSTUDYC
: character version of STUDYID
DOMAIN
: original domain of eventDVIDC
: character version of DVID
TIMEU
: time units of time variablesNROUTEC
: character version of ROUTE
NFRQC
: character version of FRQ
FDOSE
: date-time of first doseVERSN
: apmx
package versionBUILD
: date of dataset creationpk_build()
has optional parameters that can customize
the output dataset. Here are all of the options that will affect a
simple dataset. Here they are presented in their default state:
<- apmx::pk_build(ex = ex, #dataframe of prepared dose events
df_simple pc = pc, #dataframe of prepared pc observation events
time.units = "days", #can be set to days or hours.
#NOTE: units of TPT in ex and pc should match this unit
cycle.length = NA, #must be in units of days, will reset NTLC to 0
na = -999, #replaces missing nominal times and covariates with a numeric value
time.rnd = NULL, #rounds all time values to x decimal places
amt.rnd = NULL, #rounds calculated dose values to x decimal places
dv.rnd = NULL, #rounds observation columns to x decimal places
impute = NA, #imputation method for missing times
sparse = 3) #threshold for calculating sparse/serial distinctions
I recommend setting time.rnd = 3
to make the dataset
easier to read.
<- apmx::pk_build(ex, pc, time.rnd = 3) df_simple
Sometimes, you will want a more complicated dataset. Let’s explore
additional functionalities of pk_build()
.
For the most part, all covariates can be divided into four categories:
apmx
has a few requirements to help keep track of
different kinds of covariates. When you program covariates, you have to
follow these rules:
Let’s start by preparing some subject-level covariates from
dm
and lb.
All subject-level covariate data
frames require a USUBJID column. There must only be one row per subject.
Covariate names should be clear and easy to interpret.
<- DM %>%
dm ::select(USUBJID, AGE, SEX, RACE, ETHNIC) %>%
dplyr::mutate(AGEU = "years") #AGE is continuous and requires a unit dplyr
<- LB %>% #select the desired labs
lb ::filter(LBCOMPFL=="Y") %>%
dplyr::filter(LBVST %in% c("Baseline (D1)", "Screening")) %>%
dplyr::filter(LBPARAMCD %in% c("ALB", "AST", "ALT", "BILI", "CREAT")) %>%
dplyr::mutate(LBORRES = as.numeric(LBORRES))
dplyr
<- lb %>% #select the lab collected immediately prior to first dose
lb ::arrange(USUBJID, LBPARAMCD, LBDT) %>%
dplyr::group_by(USUBJID, LBPARAMCD) %>%
dplyr::filter(row_number()==max(row_number())) %>%
dplyr::ungroup()
dplyr
<- lb %>% #finish formatting and add units since all labs are continuous
lb ::select(USUBJID, LBPARAMCD, LBORRES) %>%
dplyr::pivot_wider(names_from = "LBPARAMCD", values_from = "LBORRES") %>%
tidyr::mutate(ALBU = "g/dL",
dplyrASTU = "IU/L",
ALTU = "IU/L",
BILIU = "mg/dL",
CREATU = "mg/dL")
Next, let’s prepare some time-varying covariates from
lb
. All time-varying covariate data frames require a
USUBJID
and DTIM
column.
<- LB %>%
tast ::filter(LBCOMPFL=="Y") %>%
dplyr::filter(LBPARAMCD=="AST") %>%
dplyr::mutate(LBORRES = as.numeric(LBORRES)) %>%
dplyr::select(USUBJID, DTIM = LBDT, AST = LBORRES) %>%
dplyr::mutate(ASTU = "IU/L") dplyr
<- LB %>%
talt ::filter(LBCOMPFL=="Y") %>%
dplyr::filter(LBPARAMCD=="ALT") %>%
dplyr::mutate(LBORRES = as.numeric(LBORRES)) %>%
dplyr::select(USUBJID, DTIM = LBDT, ALT = LBORRES) %>%
dplyr::mutate(ALTU = "IU/L") dplyr
You may want to add PD observations to your dataset. PD observations
have the same requirements as pc observations. Unfortunately,
apmx
does not recognize SDTM/ADaM language for PD
observations. That is because there are many types of pd events, with
many types of possible formats. You must convert all column names to
apmx
column names.
For this analysis, we will pretend glucose observations from
lb
are a meaningful biomarker. Let’s set
CMT = 3
for the PD compartment.
<- LB %>%
pd ::filter(LBCOMPFL=="Y") %>%
dplyr::filter(LBPARAM=="glucose") %>%
dplyr::mutate(DTIM = paste(LBDT, "00:00"),
dplyrVISIT = LBVST,
NDAY = case_when(VISIT=="Screening" ~ -15,
=="Baseline (D1)" ~ 1,
VISIT=="Visit 2 (D8)" ~ 8,
VISIT=="Visit 3 (D15)" ~ 15,
VISIT=="Visit 4 (D29)" ~ 29,
VISIT=="End of Treatment" ~ 45),
VISITTPT = 0,
TPTC = LBTPT,
ODV = as.numeric(LBORRES),
DVIDU = LBORRESU,
LLOQ = NA,
CMT = 3,
DVID = LBPARAM) %>%
::select(USUBJID, DTIM, NDAY, VISIT, TPT,
dplyr ODV, LLOQ, CMT, DVID, TPTC, DVIDU)
Let’s add all of the new events and covariates to the dataset.
<- apmx::pk_build(ex = ex, pc = pc, pd = pd,
df_full sl.cov = list(dm, lb),
tv.cov = list(tast, talt),
time.rnd = 3)
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005
First, you’ll notice a warning was issued in the console. We will re-visit the warnings later in this document. Instead, let’s focus on the dataset itself.
There is a new type of row where EVID = 2
.
unique(df_simple$EVID)
#> [1] 0 1
unique(df_full$EVID)
#> [1] 2 0 1
These rows capture the date-time and values of time-varying covariates. Sometimes, we want to retain the exact date-time of each time-varying covariate.
The DVID
column changed since the last visit.
unique(df_simple$DVID)
#> [1] 1 NA
unique(df_full$DVID)
#> [1] NA 2 1
unique(df_simple$DVIDC)
#> [1] "ABC999"
unique(df_full$DVIDC)
#> [1] NA "glucose" "ABC999"
There are now two observation events, ABC999 and glucose. The
NA
rows are for dose and other events.
You’ll notice that all of the covariate names changed a bit. They all received a prefix, and some received a suffix. Why do we do this? Prefixes and suffixes can identify the type of covariate:
If you can’t remember the prefixes and suffixes, that’s OK! We have
an additional function to help with that. apmx::cov_find()
will return all covariates of particular types in a PK dataset.
::cov_find(df_full, cov = "categorical", type = "numeric")
apmx#> [1] "NSTUDY" "NROUTE" "NFRQ" "NSEX" "NRACE" "NETHNIC"
::cov_find(df_full, cov = "categorical", type = "character")
apmx#> [1] "NSTUDYC" "NROUTEC" "NFRQC" "NSEXC" "NRACEC" "NETHNICC"
::cov_find(df_full, cov = "continuous", type = "numeric")
apmx#> [1] "BAGE" "BALB" "BALT" "BAST" "BBILI" "BCREAT" "TAST" "TALT"
::cov_find(df_full, cov = "units", type = "character")
apmx#> [1] "BAGEU" "BALBU" "BALTU" "BASTU" "BBILIU" "BCREATU" "TASTU"
#> [8] "TALTU"
Let’s explore the rest of the optional parameters in
pk_build()
.
<- apmx::pk_build(ex = ex, pc = pc, pd = pd,
df_full sl.cov = list(dm, lb),
tv.cov = list(tast, talt),
time.rnd = 3,
cov.rnd = NULL, #rounds observation columns to x decimal places
BDV = FALSE, #calculates baseline dependent variable for PD events
DDV = FALSE, #calculates change (delta) from baseline for PD events
PDV = FALSE, #calculates percent change from baseline for PD events
demo.map = TRUE, #adds specific numeric mapping for SEX, RACE, and ETHNIC variables
tv.cov.fill = "downup", #fill pattern for time-varying covariates
keep.other = TRUE) #keep or drop all EVID = 2 rows
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005
The dataset is a bit easier to read if we drop the other events. We will do that moving forward for the rest of the tutorial.
<- apmx::pk_build(ex = ex, pc = pc, pd = pd,
df_full sl.cov = list(dm, lb),
tv.cov = list(tast, talt),
time.rnd = 3, dv.rnd = 3,
BDV = TRUE, DDV = TRUE, PDV = TRUE,
keep.other = FALSE)
#> Warning in apmx::pk_build(ex = ex, pc = pc, pd = pd, sl.cov = list(dm, lb), :
#> The following USUBJID(s) have at least one event with missing ATFD:
#> ABC102-01-005
Time-varying covariates can be challenging to work with. The
pk_build()
function can only fill them by date-time. What
if date-time is not available in the source data?
The apmx::cov_apply()
function will add covariates to a
dataset built by pk_build()
. It will add time-varying
covariates by any time variable, including:
DTIM
ATFD
ATLD
NTFD
NTLC
NTLD
NDAY
Let’s add TAST (time-varying AST) by nominal time instead of actual time.
<- LB %>%
tast ::filter(LBCOMPFL=="Y") %>%
dplyr::filter(LBPARAMCD=="AST") %>%
dplyr::mutate(NTFD = case_when(LBVST=="Screening" ~ -15, #calculate NTFD from visit code
dplyr=="Baseline (D1)" ~ 1,
LBVST=="Visit 2 (D8)" ~ 8,
LBVST=="Visit 3 (D15)" ~ 15,
LBVST=="Visit 4 (D29)" ~ 29,
LBVST=="End of Treatment" ~ 45)) %>%
LBVST::mutate(AST = as.numeric(LBORRES)) %>%
dplyr::select(USUBJID, NTFD, AST, ASTU = LBORRESU)
dplyr
<- apmx::pk_build(ex = ex, pc = pc,
df_cov_apply sl.cov = list(dm, lb),
time.rnd = 3, dv.rnd = 3,
BDV = TRUE, DDV = TRUE, PDV = TRUE,
keep.other = FALSE) %>%
::cov_apply(tast, time.by = "NTFD") apmx
cov_apply()
can also add subject-level covariates by any
subject identifier.
<- apmx::pk_build(ex = ex, pc = pc,
df_cov_apply time.rnd = 3, dv.rnd = 3,
BDV = TRUE, DDV = TRUE, PDV = TRUE,
keep.other = FALSE) %>%
::cov_apply(dm) %>%
apmx::cov_apply(lb) %>%
apmx::cov_apply(talt, time.by = "DTIM") %>%
apmx::cov_apply(tast, time.by = "NTFD") apmx
cov_apply()
can also add empirical bayes estimates or
exposure metrics. Notice these also get their own prefixes.
cov_apply()
cannot handle units for these parameters at
this time.
Let’s try adding exposure metrics and parameter estimates to the dataset. First, we will generate dummy exposures and parameter estimates.
<- data.frame(ID = 1:22, #exposure metrics
exposure MAX = 1001:1022,
MIN = 101:122,
AVG = 501:522)
<- data.frame(ID = 1:22, #individual clearance and central volume estimates
parameters CL = seq(0.1, 2.2, 0.1),
VC = seq(1, 11.5, 0.5))
<- apmx::pk_build(ex = ex, pc = pc,
df_cov_apply time.rnd = 3, dv.rnd = 3,
BDV = TRUE, DDV = TRUE, PDV = TRUE,
keep.other = FALSE) %>%
::cov_apply(dm) %>%
apmx::cov_apply(lb) %>%
apmx::cov_apply(talt, time.by = "DTIM", keep.other = FALSE) %>%
apmx::cov_apply(tast, time.by = "NTFD", keep.other = FALSE) %>%
apmx::cov_apply(exposure, id.by = "ID", exp = TRUE) %>%
apmx::cov_apply(parameters, id.by = "ID", ebe = TRUE) apmx
It is recommended you always use pk_build()
or
cov_apply()
to add covariates instead of adding them in
yourself. That ensures cov_find()
always finds the
covariates correctly.
::cov_find(df_cov_apply, cov = "categorical", type = "numeric")
apmx#> [1] "NSTUDY" "NROUTE" "NFRQ" "NSEX" "NRACE" "NETHNIC"
::cov_find(df_cov_apply, cov = "categorical", type = "character")
apmx#> [1] "NSTUDYC" "NROUTEC" "NFRQC" "NSEXC" "NRACEC" "NETHNICC"
::cov_find(df_cov_apply, cov = "continuous", type = "numeric")
apmx#> [1] "BAGE" "BALB" "BALT" "BAST" "BBILI" "BCREAT" "TALT" "TAST"
::cov_find(df_cov_apply, cov = "units", type = "character")
apmx#> [1] "BAGEU" "BALBU" "BALTU" "BASTU" "BBILIU" "BCREATU" "TALTU"
#> [8] "TASTU"
::cov_find(df_cov_apply, cov = "exposure", type = "numeric")
apmx#> [1] "CMAX" "CMIN" "CAVG"
::cov_find(df_cov_apply, cov = "empirical bayes estimate", type = "numeric")
apmx#> [1] "ICL" "IVC"
pk_build()
and other apmx functions issue
errors/warnings for problematic data. What is the warning we have been
receiving this whole time? First, let’s filter our dataset to the one
subject triggering the warning:
<- df_full %>%
warning ::filter(USUBJID=="ABC102-01-005")
dplyr
nrow(warning)
#> [1] 1
$DVIDC
warning#> [1] "glucose"
This subject has 1 PD observation, no dose or PK observations.
Because there is no dose, you cannot calculate ATFD
(actual
time since first dose). The warning informs you which subjects have this
particular problem. This helps you diagnose potential problems with your
data. Notice in this instance, the record is flagged by C
and TIMEF
.
$C
warning#> [1] "C"
$TIMEF
warning#> [1] 1
There are other errors and warnings to help you diagnose your data as well. There is a key difference between the two:
What if you are missing a required column in your input domain?
<- ex[, -5]
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): Column NDAY is missing from the ex dataset.
What if the variable types are incorrect?
<- ex
ex_error $USUBJID <- 1:42
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): Column USUBJID in ex is not character type.
What if a required value is missing?
<- ex
ex_error $USUBJID[5] <- NA
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): USUBJID missing in ex for at least 1 row.
What if we program ADDL
but not II
for dose
events?
<- ex
ex_error $ADDL <- 1
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): If ex contains ADDL, it must contain II
What if date-time is not formatted correctly?
<- ex
ex_error $EXSTDTC <- substr(ex_error$EXSTDTC, 1, 10)
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): DTIM in ex is not ISO 8601 format.
What if the baseline nominal day NDAY == 0
instead of
1?
<- ex
ex_error $EXSTDY <- 0
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): NDAY in ex has a 0 measurement. Please confirm day of first dose is nominal day 1 and the day prior to first dose is nominal day -1.
Nominal days can be tricky. The day a patient takes their first dose is day 1. The day before their first dose is day -1. Therefore, there is no study day 0.
What if ADDL
and II
are both present, but
one of them is NA
?
<- ex
ex_error $ADDL <- 1
ex_error$II <- c(rep(1, 41), NA)
ex_error
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): At least one row in ex has a documented ADDL when II is NA.
What if you only enter a dose domain?
::pk_build(ex)
apmx#> Error in apmx::pk_build(ex): Please enter a pc or pd domain.
What if a pc observation is 0 or negative?
<- pc
pc_error $PCSTRESN[10] <- 0
pc_error
::pk_build(ex, pc_error)
apmx#> Error in apmx::pk_build(ex, pc_error): At least one dependent variable in PC is less than or equal to 0.
What if the study code is not included in ex
or
sl.cov
? Note that you can pass the study code variable
through sl.cov
or ex
.
<- ex %>%
ex_error select(-STUDYID)
::pk_build(ex_error, pc)
apmx#> Error in apmx::pk_build(ex_error, pc): STUDY column must be included in ex or sl.cov.
What if you have multiple values for a subject-level covariate within one subject?
<- dm
dm_error $USUBJID[2] <- "ABC102-01-001"
dm_error
::pk_build(ex, pc, sl.cov=dm_error)
apmx#> Error in apmx::pk_build(ex, pc, sl.cov = dm_error): sl.cov has duplicate USUBJID rows.
What if you select a time unit not supported by
pk_build
?
::pk_build(ex, pc, time.units="minutes")
apmx#> Error in apmx::pk_build(ex, pc, time.units = "minutes"): time.units parameter must be in days or hours.
What if you program DDV
and/or PDV
without
calculating BDV
?
::pk_build(ex, pc, pd, DDV=TRUE, PDV==TRUE)
apmx#> Error in apmx::pk_build(ex, pc, pd, DDV = TRUE, PDV == TRUE): object 'PDV' not found
What if you pass the same covariate through multiple dataframes?
<- ex
ex_error $NSEX <- 0
ex_error
::pk_build(ex_error, pc, sl.cov = dm)
apmx#> Error in apmx::pk_build(ex_error, pc, sl.cov = dm): NSEX column is duplicated in sl.cov and another dataset. Please include this column in one dataset only.
Note you are allowed to pass other columns through the ex, pc, and pd
domains. For example, try adding the column SEX
instead of
NSEX
. If you pass an extra column through ex, pc, or pd, it
will not be impacted by the function.
What if you provide a continuous covariate but forget to provide units?
<- dm %>%
dm_error select(-AGEU)
::pk_build(ex, pc, sl.cov = dm_error)
apmx#> Error in apmx::pk_build(ex, pc, sl.cov = dm_error): All numerical covariates in sl.cov need units.
These datasets will build, but pk_build()
will inform
you of potential problems. What if a subject has no covariates, but
others do?
<- dm
dm_warning <- dm_warning[1:4,]
dm_warning
<- apmx::pk_build(ex, pc, sl.cov=dm_warning)
df_warning #> Warning in apmx::pk_build(ex, pc, sl.cov = dm_warning): The following
#> USUBJID(s) have PKPD events but are not in sl.cov: ABC102-01-006,
#> ABC102-02-001, ABC102-02-002, ABC102-02-003, ABC102-02-004, ABC102-03-001,
#> ABC102-03-002, ABC102-03-003, ABC102-03-004, ABC102-04-001, ABC102-04-002,
#> ABC102-04-003, ABC102-04-004, ABC102-04-005, ABC102-04-006, ABC102-04-007,
#> ABC102-04-008
<- apmx::pk_build(ex, pc, sl.cov = list(dm_warning, lb)) df_warning
Notice the warning is only triggered if a subject has NO covariates.
In the second case, all subjects are included in lb, while only some are
in dm
. The warning does not issue if the subject has at
least 1 covariate. All missing covariate are filled with the missing
parameter, default -999
.
What if a subject does not have any baseline PD events and
BDV|DDV|PDV == TRUE
? Notice the warning is only issued if
BDV
, DDV
, or PDV
are
calculated.
<- pd
pd_warning <- pd[3:nrow(pd_warning), ]
pd_warning
<- apmx::pk_build(ex, pc, pd_warning, BDV=TRUE)
df_warning #> Warning in apmx::pk_build(ex, pc, pd_warning, BDV = TRUE): The following
#> USUBJID(s) do not have a baseline glucose observation at or prior to first dose
#> (BDV, DDV, PDV not calculated): ABC102-01-001
#> Warning in apmx::pk_build(ex, pc, pd_warning, BDV = TRUE): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-005
<- apmx::pk_build(ex, pc, pd_warning)
df_warning #> Warning in apmx::pk_build(ex, pc, pd_warning): The following USUBJID(s) have at
#> least one event with missing ATFD: ABC102-01-005
What if the source data events occurred out of order? You’ll notice
the NTFD
of the first observation falls after the next
event.
<- pc
pc_warning $TPT[1] <- 0.07
pc_warning
<- apmx::pk_build(ex, pc_warning,
df_warning time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_warning, time.rnd = 3): The following
#> USUBJID(s) have at least one event that occurred out of protocol order (NTFD is
#> not strictly increasing): ABC102-01-001
What if a dose event is missing AMT
? The record is
automatically C-flagged and a warning is issued. Note that the PK
records for this subject are not C-flagged.
<- ex
ex_warning $EXDOSE[1] <- NA
ex_warning
<- apmx::pk_build(ex_warning, pc,
df_warning time.rnd = 3)
#> Warning in apmx::pk_build(ex_warning, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one dose event with missing AMT: ABC102-01-001
What if there are two events that occur at the same time? Notice how the duplicated events are C-flagged and a warning is issued.
<- pc
pc_warning 2, ] <- pc_warning[1, ]
pc_warning[$PCSTRESN[2] <- 1400
pc_warning
<- apmx::pk_build(ex, pc_warning,
df_warning time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_warning, time.rnd = 3): The following
#> USUBJID(s) have at least one duplicate event: ABC102-01-001
What if you have a long column names? This warning informs you some column names are longer than 8 characters. This will prevent you from converting the dataset to a .xpt file if desired.
<- dm %>%
dm_warning rename(ETHNICITY = ETHNIC)
<- apmx::pk_build(ex, pc, sl.cov = dm_warning)
df_warning #> Warning in apmx::pk_build(ex, pc, sl.cov = dm_warning): The following column
#> name(s) are longer than 8 characters: NETHNICITY, NETHNICITYC
What if your baseline covariates and time-varying covariates are not
equivalent at baseline? In theory, all baseline covariates and
time-varying covarites should agree at NTFD == 0
.
<- lb
lb_warning $ALT[1] <- 31
lb_warning
<- apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt)
df_warning #> Warning in apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt): The
#> following USUBJID(s) have at least one event with missing ATFD: ABC102-01-005
#> Warning in apmx::pk_build(ex, pc, sl.cov = lb_warning, tv.cov = talt): BALT and
#> TALT are not equivalent at first dose (baseline).
Some of our errors and warnings discuss problems with date/time
elements of ex
and pc
. What do you do when you
have an event, but the date/time information is missing?
pk_build
provides two methods for imputing missing
times:
ATFD
relative to other
events occurring at the same visit. This method is good for phase
I/II/III trialsLet’s experiment with these two methods. First, we will drop some
date-times from pc
and replace them with
NA
.
<- pc
pc_impute $PCDTC[c(4, 39, 73, 128)] <- NA
pc_impute
<- apmx::pk_build(ex, pc_impute,
df_impute time.rnd = 3)
#> Warning in apmx::pk_build(ex, pc_impute, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001,
#> ABC102-01-002, ABC102-01-004, ABC102-02-002
This triggers the warning for missing ATFD
as expected.
Now, let’s try impute method 1.
<- apmx::pk_build(ex, pc_impute,
df_impute_1 time.rnd = 3, impute = 1)
#> Warning in apmx::pk_build(ex, pc_impute, time.rnd = 3, impute = 1): The
#> following USUBJID(s) have at least one event that occurred out of protocol
#> order (NTFD is not strictly increasing): ABC102-01-004
First, notice we have a new warning. We’ll come back to that later.
You should also notice that all events have times and the time warning
disappeared. The imputation is notated with the IMPEX
and
IMPDV
columns.
nrow(df_impute_1[is.na(df_impute_1$ATFD),]) #number of rows with missing ATFD
#> [1] 0
<- df_impute_1 %>%
imputed_events_1 ::filter(IMPDV==1 | IMPEX==1) dplyr
IMPDV
will flag observation records with an imputed
time. IMPEX
will flag all records impacted by an imputed
dose. You’ll notice we still have a warning for one subject. Let’s find
out why.
<- df_impute_1 %>%
times_check_1 ::filter(USUBJID=="ABC102-01-004") dplyr
Notice row 12 has an imputed time ATFD = 14.042
. That is
because NTFD = 14.042
for that record. However, the dose
for this visit was administered a few days late, at time
ATFD = 16.053
. This imputation puts the post-dose sample
two days ahead of the dose. Impute method 1 a poor assumption for this
missing date.
Let’s try method 2 to see if that assumption is better. Method 2 takes the late dose into account by estimating the time of the sample relative to the other events that day.
<- apmx::pk_build(ex, pc_impute,
df_impute_2 time.rnd = 3, impute = 2)
<- df_impute_2 %>%
imputed_events_2 ::filter(IMPDV==1 | IMPEX==1) dplyr
You’ll notice the warning disappears. Let’s check that subject again.
<- df_impute_2 %>%
times_check_2 ::filter(USUBJID=="ABC102-01-004") dplyr
You’ll notice that under this method, when
NTFD = 14.042
, ATFD = 16.094
. Why?
NTFD = 14
,
ATFD = 16.053
NTFD = 14.042
,
ATFD = 16.053 + (14.042 - 14) = 16.094
(the number may
round a thousandth of a day off)What if we are missing a date/time for a dose event? Let’s repeat the experiment.
<- ex
ex_impute $EXSTDTC[2] <- NA
ex_impute
<- apmx::pk_build(ex_impute, pc, #no imputation method
df_impute time.rnd = 3)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001
<- apmx::pk_build(ex_impute, pc, #imputation method 1
df_impute_1 time.rnd = 3, impute = 1)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3, impute = 1): The
#> following USUBJID(s) have at least one event that occurred out of protocol
#> order (NTFD is not strictly increasing): ABC102-01-001
<- df_impute_1 %>% #imputed records
imputed_events_1 ::filter(IMPDV==1 | IMPEX==1) dplyr
Now, a lot of records for subject 1 have IMPEX == 1
.
This is because all of these observations are associated with a dose
with an imputed time. Is method 1 a good assumption?
ATFD = NTFD = 14
.ATFD = 12.9
.ATLD
is
calculated incorrectly.Let’s try method 2 to see the difference. You’ll notice the events are in the correct order and times are imputed successfully.
<- apmx::pk_build(ex_impute, pc,
df_impute_2 time.rnd = 3, impute = 2)
<- df_impute_2 %>%
imputed_events_2 ::filter(IMPDV==1 | IMPEX==1) dplyr
What if the first dose is missing instead of the second dose? Let’s repeat the experiment, this time with method 2 only since we can assume method 1 won’t work well in this scenario.
<- ex
ex_impute $EXSTDTC[1] <- NA
ex_impute
<- apmx::pk_build(ex_impute, pc, # No imputation method, expect a warning
df_impute time.rnd = 3)
#> Warning in apmx::pk_build(ex_impute, pc, time.rnd = 3): The following
#> USUBJID(s) have at least one event with missing ATFD: ABC102-01-001
<- apmx::pk_build(ex_impute, pc, #imputation method 2
df_impute_2 time.rnd = 3, impute = 2)
<- df_impute_2 %>% #imputed events
imputed_events_2 ::filter(IMPDV==1 | IMPEX==1 | IMPFEX==1) dplyr
Notice an extra column was created, IMPFEX.
IMPFEX
: imputed time of first dose.IMPEX
will only apply to all records until the next
dose with a known date-time.One final experiment - what if we are missing date-times from
ex
and pc
? Note all times are imputed
successfully and all warnings disappear.
<- ex
ex_impute $EXSTDTC[1:2] <- NA
ex_impute
<- apmx::pk_build(ex = ex_impute, pc = pc_impute, #no impuation method
df_impute time.rnd = 3)
#> Warning in apmx::pk_build(ex = ex_impute, pc = pc_impute, time.rnd = 3): The
#> following USUBJID(s) have at least one event with missing ATFD: ABC102-01-001,
#> ABC102-01-002, ABC102-01-004, ABC102-02-002
<- apmx::pk_build(ex = ex_impute, pc = pc_impute, #imputation method 2
df_impute_2 time.rnd = 3, impute = 2)
What if we have multiple studies we want to analyze at once? We could
create one large ex
, pc
, etc. input with each
study, or we could use apmx::pk_combine()
to combine two
datasets built by pk_build()
.
Let’s create a copy of df_full
and change it slightly.
We’ll pretend it’s built from a second study, ABC103.
<- df_full %>%
df_full2 ::filter(DOMAIN!="PD") %>% #remove glucose observations
dplyr::filter(ID<19) %>% #remove subject 19
dplyr::group_by(ID) %>%
dplyr::mutate(NSTUDYC = "ABC103", #update study ID
dplyrUSUBJID = gsub("ABC102", "ABC103", USUBJID),
BAGE = round(rnorm(1, 45, 10)), #re-create all continuous covariates
BALB = round(rnorm(1, 4, 0.5), 1),
BALT = round(rnorm(1, 30, 5)),
BAST = round(rnorm(1, 33, 5)),
BBILI = round(rnorm(1, 0.7, 0.2), 3),
BCREAT = round(rnorm(1, 0.85, 0.2), 3),
TAST = ifelse(NTFD==0, BAST, round(rnorm(1, 33, 5))),
TALT = ifelse(NTFD==0, BALT, round(rnorm(1, 30, 5)))) %>%
::ungroup() dplyr
Now, we can combine these two studies together.
<- apmx::pk_combine(df_full, df_full2)
df_combine #> Warning in apmx::pk_combine(df_full, df_full2): Datasets have different number
#> of DVIDs.
#> Warning in apmx::pk_combine(df_full, df_full2): CMT = 3 not included in df2
You’ll notice we have a few more warnings issued with this function.
That is because our DVID
assignments are different.
unique(df_full$DVID)
#> [1] 2 1 NA
unique(df_full2$DVID)
#> [1] 1 NA
If you forgot to add pd events for study 2, this warning will remind you. For thits tutorial, we will continue to exclude them.
Once we are done creating our dataset, we can read it out with the
function apmx::pk_write()
. This ensures the dataset is read
out in a NONMEM-usable format.
<- "PK_ABC101_V01.csv"
name ::pk_write(df_combine, file.path(tempdir(), name)) apmx
Documenting a dataset is important when working with a team and when
sharing work with outside organizations or regulatory agencies. For
example, the FDA requires all population pharmacometric analysis
datasets be accompanied with a definition file. apmx
provides tools to help you document your dataset.
We will start by exploring the definition file feature. The
definition file sources variable names from a dataframe of definitions
created with apmx::variable_list_create()
. It comes
pre-filled with definitions for standard apmx variables, and gives you
the ability to add your own for covariates and other custom variables.
NOTE you do not have to add prefixes and suffixes to this list, just the
root term of each covariate (SEX
instead of
NSEX
and NSEXC
).
<- apmx::variable_list_create(variable = c("SEX", "RACE", "ETHNIC", "AGE",
vl "ALB", "ALT", "AST", "BILI", "CREAT"),
categorization = rep("Covariate", 9),
description = c("sex", "race", "ethnicity", "age",
"albumin", "alanine aminotransferase",
"aspartate aminotransferase",
"total bilirubin", "serum creatinine"))
Now, let’s create the definition file.
<- apmx::pk_define(df = df_combine,
define variable.list=vl)
You can export the definition file to a word document using the
file
argument. The project
and
data
parameters can be used to add a custom project name
and dataset name to the header of the document. To use this feature, you
must use a Word document template with the words “Project” and “Dataset”
in the header. You can provide the template of the Word document with
the template
parameter.
<- apmx::pk_define(df = df_combine,
define file = file.path(tempdir(), "definition_file.docx"),
variable.list=vl,
project = "Sponsor Name",
data = "Dataset Name")
Next, let’s create a version log. Version logs are important when we have multiple datasets over a project duration. Datasets can be updated for all sorts of reasons:
Similar to the definition function, we can provide a template for
formatting. You can also provide a comment to describe the source data.
The version log is easiest to use when you read it out as a word
document using the file
parameter.
<- apmx::version_log(df = df_combine,
vrlg name = name,
file = file.path(tempdir(), "version_log.docx"),
src_data = "original test data")
Open the version log document and take a look around. Notice that
there is a column called “Comments”. You can add a comment there in the
Word document, and the function will not overwrite it. When you produce
a new dataset, call apmx::version_log()
again with the new
dataset, the most recent dataset, the new dataset name, and the same
filepath as the previous log. You will need to use comp_var
to group the rows for comparison. For PKPD datasets, we recommend
grouping by USUBJID
, ATFD
, EVID
,
and DVID
. This function will update the version log by
adding a new row to the Word document.
Lastly, apmx
can help you produce summary tables of your
datasets. apmx::pk_summarize()
produces three types of
summary tables:
Tables can be stratified by any other categorical covariate in the dataset.
<- apmx::pk_summarize(df = df_combine) sum1
The summary function has other parameters to help you document the dataset:
strat.by
will stratify the dataset by any
variable.ignore.C
will remove all C-flagged records from the
analysis.docx
will produce word document versions of the summary
tablespptx
will produce powerpoint slides of the summary
tables. NOTE: pptx feature is still under developmentignore.request
will filter out an expression passed
through this parameter.<- apmx::pk_summarize(df = df_combine,
sum2 strat.by = c("NSTUDYC", "NSEXC"),
ignore.request = "NRACE == 2")