This vignette guides users on how to estimate intrinsic hepatic clearance (Clint) from mass spectrometry data. Intrinsic hepatic clearance is a chemical specific parameter that describes a liver’s ability to metabolize an unbound compound. It is characterized by the disappearance of compound when incubated with primary hepatocytes (Breen et al. (2021)).
The mass spectrometry data should be collected from an assay that uses cryopreserved hepatocyte suspension (Shibata et al. (2002)) as seen in Figure 1.
Fig 1: Clint experimental set up
First, we load in the example dataset from
invitroTKstats
.
Many datasets are loaded in: clint_L0
,
clint_L1
,clint_L2
, clint_L3
, and
clint_L4
. These datasets are Clint data at Level
0, 1, 2, 3, and 4 respectively. Additional datasets associated with
Level 4 processing are also loaded in: clint_L2_heldout
and
clint_PREJAGS
. These will be described later in the “Level
4 processing” section. Lastly, a clint_cheminfo
dataset is
loaded in that contains chemical information necessary for
identification mapping; it is used to create Level 0 data. For the
purpose of this vignette, we’ll start with clint_L0
, the
Level 0 data, to demonstrate the complete pipelining process.
clint_L0
is the output from the
merge_level0
function which compiles raw lab data from
specified Excel files into a singular data frame. The data frame
contains exactly one row per sample with information obtained from the
mass spectrometer. For more details on curating raw lab data to a
singular Level 0 data frame, see the “Data Guide Creation and Level-0
Data Compilation” vignette.
The following table displays the first three rows of
clint_L0
, our Level 0 data.
Compound | DTXSID | Lab.Compound.ID | Date | Sample | Type | Compound.Conc | Peak.Area | ISTD.Peak.Area | ISTD.Name | Analysis.Params | Level0.File | Level0.Sheet | Sample Text | Time | Dilution.Factor |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample003 | Blank | 95.670 | 18,189.30 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB SPE Blank | 480 | ||
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample004 | Blank | 254.218 | 17,998.58 | propranolol-d7 | 3.86 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB SPE Matrix Blank | 480 | ||
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample017 | CC | 0.0060456 | 56.500 | 18,123.62 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB CC1 | 240 |
format_clint
is the Level 1 function used to create a
standardized data frame. This level of processing is necessary because
naming conventions or formatting can differ across data sets.
If the Level 0 data already contains the required column, then the
existing column name can be specified. For example,
clint_L0
already contains a column specifying the sample
name called “Sample”. However, the default column name for sample name
is “Lab.Sample.Name”. Therefore, we specify the correct column with
sample.col = "Sample"
. In general, to specify an already
existing column that differs from the default, the user must use the
parameter with the .col
suffix.
If the Level 0 data does not already contain the required column,
then the entire column can be populated with a single value. For
example, clint_L0
does not contain a column specifying
biological replicates. Therefore, we populate the required column with
biological.replicates = 1
. In general, to specify a single
value for an entire column, the user must use the parameter without the
.col
suffix.
Users should be mindful if they choose to specify a single value for all of their samples; they should verify this action is one they wish to take.
Some columns must be present in the Level 0 data while others can be
filled with a single value. At minimum, the following columns must be
present in the Level 0 data and specification with a single entry is not
permitted: sample.col
, compound.col
,
dtxsid.col
, lab.compound.col
,
type.col
, istd.col
, and
area.col
.
If there is no additional note.col
in the Level 0 data,
users should use note.col = NULL
to fill the column with
“Note”.
The rest of the following columns may either be specified from the
Level 0 data or filled with a single value: date.col
or
date
, density.col
or density
,
cal.col
or cal
, dilution.col
or
dilution
, time.col
or time
,
istd.name.col
or istd.name
,
istd.conc.col
or istd.conc
,
test.conc.col
or test.conc
,
test.nominal.conc.col
or test.nominal.conc
,
biological.replicates.col
or
biological.replicates
, analysis.method.col
or
analysis.method
, analysis.instrument.col
or
analysis.instrument
, analysis.parameters.col
or analysis.parameters
, level0.file.col
or
level0.file
, and level0.sheet.col
or
level0.sheet
.
Argument | Default | Required in L0? | Corresp. single-entry Argument | Descr. |
---|---|---|---|---|
FILENAME | MYDATA | N/A | Output and input filename | |
data.in | N/A | Level 0 data frame | ||
sample.col | Lab.Sample.Name | Y | Lab sample name | |
date.col | Date | N | date | Lab measurement date |
compound.col | Compound.Name | Y | Formal test compound name | |
dtxsid.col | DTXSID | Y | EPA's DSSTox Structure ID | |
lab.compound.col | Lab.Compound.Name | Y | Lab test compound name (abbr.) | |
type.col | Sample.Type | Y | Sample type (Blank/Cvst/Inactive/CC) | |
density.col | Hep.Density | N | density | Hepatocyte density in the in vitro incubation |
cal.col | Cal | N | cal | MS calibration |
dilution.col | Dilution.Factor | N | dilution | Number of times sample was diluted |
time.col | Time | N | time | Time since chemical was introduced |
istd.col | ISTD.Area | Y | Internal standard peak area | |
istd.name.col | ISTD.Name | N | istd.name | Internal standard name |
istd.conc.col | ISTD.Conc | N | istd.conc | Internal standard concentration |
test.conc.col | Test.Compound.Conc | N | test.conc | Standard test chemical concentration |
test.nominal.conc.col | Test.Target.Conc | N | test.nominal.conc | Initial chemical concentration |
area.col | Area | Y | Target analyte peak area | |
biological.replicates.col | Biological.Replicates | N | biological.replicates | Replicates with the same analyte |
technical.replicates.col | Technical.Replicates | N | technical.replicates | Repeated measurements from one sample |
analysis.method.col | Analysis.Method | N | analysis.method | Analytical chemistry analysis method |
analysis.instrument.col | Analysis.Instrument | N | analysis.instrument | Analytical chemistry analysis instrument |
analysis.parameters.col | Analysis.Parameters | N | analysis.parameters | Analytical chemistry analysis parameters |
note.col | Note | N | Additional notes | |
level0.file.col | Level0.File | N | level0.file | Raw data filename |
level0.sheet.col | Level0.Sheet | N | level0.sheet | Raw data sheet name |
output.res | FALSE | N/A | Export results (TSV)? | |
save.bad.types | FALSE | N/A | Export bad data (TSV)? | |
sig.figs | 5 | N/A | Number of significant figures | |
INPUT.DIR | N/A | Input directory of Level 0 file | ||
OUTPUT.DIR | N/A | Export directory to save Level 1 files |
A TSV file containing the level-1 data can be exported to the user’s
per-session temporary directory. This temporary directory is a
per-session directory whose path can be found with the following code:
tempdir()
. For more details, see [https://www.collinberke.com/til/posts/2023-10-24-temp-directories/].
To avoid exporting to this temporary directory, an
OUTPUT.DIR
must be specified. We have omitted this export
entirely with output.res = FALSE
(the default). The option
to omit exporting a TSV file is also available at levels 2 and 3 and
will be used from this point forward.
clint_L1_curated <- format_clint(FILENAME = "Clint_vignette",
data.in = clint_L0,
# columns present in L0 data
sample.col = "Sample",
compound.col = "Compound",
lab.compound.col = "Lab.Compound.ID",
type.col = "Type",
istd.col = "ISTD.Peak.Area",
area.col = "Peak.Area",
note.col = "Sample Text",
test.conc.col = "Compound.Conc",
analysis.parameters.col = "Analysis.Params",
# columns not present in L0 data
density = 0.5,
test.nominal.conc = 1,
analysis.instrument = "Unknown",
analysis.method = "LCMS",
istd.conc = 0.01,
cal = 1,
biological.replicates = 1,
# don't export output TSV file
output.res = FALSE
)
#> Warning in format_clint(FILENAME = "Clint_vignette", data.in = clint_L0, : Data
#> with inappropriate sample types were removed.
#> 229 observations of 3 chemicals based on 3 separate measurements (calibrations).
We receive a warning message that some of our samples have been removed due to “inappropriate sample types”. The removed samples include data that did not have one of the following sample types
The following table displays some of the samples that were removed. These samples are annotated with a “QC” sample type indicating “Quality Control” and should therefore not be included in our analysis.
Compound | DTXSID | Lab.Compound.ID | Date | Sample | Type | Compound.Conc | Peak.Area | ISTD.Peak.Area | ISTD.Name | Analysis.Params | Level0.File | Level0.Sheet | Sample Text | Time | Dilution.Factor |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample037 | QC | 932.885 | 17,947.17 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB QC-B | 480 | ||
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample038 | QC | 2,049.694 | 17,701.59 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB QC-C | 480 | ||
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample039 | QC | 7,152.603 | 18,042.49 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB QC-D | 480 | ||
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample105 | QC | 984.653 | 18,864.06 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB QC-B | 480 | ||
Phenacetin | DTXSID1021116 | phenacetin | 012822 | 20220128_PFAS_Clint_Hep12HLB_Sample106 | QC | 1,756.444 | 18,264.91 | propranolol-d7 | 3.83 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | HLB QC-C | 480 |
Users can verify that these samples are excluded by filtering their Level 1 data frame and ensuring no matches are found. One approach is given below.
# All the samples of an inappropriate sample type
excluded <- clint_L0 %>%
filter(!Type %in% c("Blank", "Cvst", "Inactive", "CC"))
# Exclude based on Sample and DTXSID
X <- c(excluded$Sample)
names(X) <- paste(excluded$Sample, excluded$DTXSID, sep = "+")
Y <- c(excluded$DTXSID)
# Find samples in L1 data frame with matching sample name (X) and DTXSID (Y)
matches <- as.data.frame(t(mapply(function(X,Y)
{subset(clint_L1_curated, Lab.Sample.Name == X & DTXSID == Y)},
X, Y, USE.NAMES = T)))
matches
# Check that no matches were returned for each sample in `excluded`
check <- rep(1,nrow(matches))
names(check) <- rownames(matches)
for (name in rownames(matches)) {
# If no matches were found, each element of check should evaluate to 0
check[name] <- sum(sapply(c(1:ncol(matches)),
function(X) {length(matches[1,X][[1]])}))
}
# Verify that each sample in `excluded` had no matches
check
# If no matches, should evaluate to 0
sum(check)
To export the removed samples types as a TSV, the user can set the
parameter save.bad.types = TRUE
. For this example, we will
not save the discarded sample types.
The following table displays the first three rows of
clint_L1_curated
, our Level 1 data produced from
format_clint
. In addition to the columns specified by the
user, there is an additional column called Response
. This
column is the test compound concentration and is calculated as \(\textrm{Response} = \frac{\textrm{Analyte
Area}}{\textrm{ISTD Area}}*\textrm{ISTD Conc}\) where \(\textrm{Analyte Area}\) is defined by the
Area
column, \(\textrm{ISTD
Area}\) is defined by the ISTD.Area
column, and
\(\textrm{ISTD Conc}\) is defined by
the ISTD.Conc
column.
Lab.Sample.Name | Date | Compound.Name | DTXSID | Lab.Compound.Name | Sample.Type | Dilution.Factor | Calibration | ISTD.Name | ISTD.Conc | ISTD.Area | Area | Analysis.Method | Analysis.Instrument | Analysis.Parameters | Note | Level0.File | Level0.Sheet | Time | Test.Compound.Conc | Test.Nominal.Conc | Hep.Density | Biological.Replicates | Response |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20220128_PFAS_Clint_Hep12HLB_Sample003 | 012822 | Phenacetin | DTXSID1021116 | phenacetin | Blank | 480 | 1 | propranolol-d7 | 0.01 | 18,189.30 | 95.670 | LCMS | Unknown | 3.83 | HLB SPE Blank | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | 1 | 0.5 | 1 | 0.00005259686 | ||
20220128_PFAS_Clint_Hep12HLB_Sample004 | 012822 | Phenacetin | DTXSID1021116 | phenacetin | Blank | 480 | 1 | propranolol-d7 | 0.01 | 17,998.58 | 254.218 | LCMS | Unknown | 3.86 | HLB SPE Matrix Blank | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | 1 | 0.5 | 1 | 0.00014124340 | ||
20220128_PFAS_Clint_Hep12HLB_Sample017 | 012822 | Phenacetin | DTXSID1021116 | phenacetin | CC | 240 | 1 | propranolol-d7 | 0.01 | 18,123.62 | 56.500 | LCMS | Unknown | 3.83 | HLB CC1 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | 0.0060456 | 1 | 0.5 | 1 | 0.00003117478 |
sample_verification
is the Level 2 function used to add
a verification column. The verification column indicates whether a
sample should be included in the point estimation (Level 3) and credible
interval (Level 4) processing. This column allows users to keep all
samples in their data but only utilize the reliable samples for
Clint estimation. All of the data in Level 2 is identical to
the data in Level 1 with the exception of the additional
Verified
column.
To determine whether a sample should be included, the user should consult the wet-lab scientists from where their data originates or a chemist who may be able to provide reliable rationale for samples that should not be verified. This level of processing allows the user to receive feedback from the wet-lab scientists, exclude erroneous or unreliable samples, and produce new Clint estimates. Thus, there is always an open channel of communication between the user and the wet-lab scientists or chemists.
We will use the already processed Level 2 data frame,
clint_L2
, to regenerate our exclusion data. In general, the
user would not have access to the exclusion information a
priori.
The exclusion data frame must include the following columns:
Variables
, Values
, and Message
.
The Variables
column contains the variable names used to
filter the excluded rows. Here, we are using
Lab.Sample.Name
and DTXSID
to identify the
excluded rows separated by a “|”. The Values
column
contains the values of the variables, as a character, also separated by
a “|”. The Message
column contains the reason for
exclusion. Here, we are using the reasons listed in the
Verified
column in clint_L2
. The user should
refrain from using “|” in any of their descriptions to avoid conflicts
with the sample_verification
function.
# Use verification data from loaded in `clint_L2` data frame
exclusion <- clint_L2 %>%
filter(Verified %in% c("Wrong Mix", "Unknown Conc.")) %>%
mutate("Variables" = "Lab.Sample.Name|DTXSID") %>%
mutate("Values" = paste(Lab.Sample.Name, DTXSID, sep = "|")) %>%
mutate("Message" = Verified) %>%
select(Variables, Values, Message)
Variables | Values | Message |
---|---|---|
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12HLB_Sample085|DTXSID6023525 | Unknown Conc. |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample017|DTXSID80380256 | Unknown Conc. |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample018|DTXSID80380256 | Unknown Conc. |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample019|DTXSID80380256 | Unknown Conc. |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample075|DTXSID80380256 | Wrong Mix |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample109|DTXSID80380256 | Wrong Mix |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample110|DTXSID80380256 | Wrong Mix |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample111|DTXSID80380256 | Wrong Mix |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample114|DTXSID80380256 | Wrong Mix |
Lab.Sample.Name|DTXSID | 20220128_PFAS_Clint_Hep12WAX_Sample115|DTXSID80380256 | Wrong Mix |
clint_L2_curated <- sample_verification(FILENAME = "Clint_vignette",
data.in = clint_L1_curated,
assay = "Clint",
exclusion.info = exclusion,
# don't export output TSV file
output.res = FALSE)
Our Level 2 data now contains a Verified
column. If the
sample should be included, the column contains a “Y” for yes. If the
sample should be excluded, the column contains the reason for
exclusion.
The following table displays some rows of the Level 2 data, each with
a different value in the Verification
column. Note, the
reasons for exclusion are not limited to “Unknown Conc.” or “Wrong Mix”,
they were merely provided as an example.
Lab.Sample.Name | Date | Compound.Name | DTXSID | Lab.Compound.Name | Sample.Type | Dilution.Factor | Calibration | ISTD.Name | ISTD.Conc | ISTD.Area | Area | Analysis.Method | Analysis.Instrument | Analysis.Parameters | Note | Level0.File | Level0.Sheet | Time | Test.Compound.Conc | Test.Nominal.Conc | Hep.Density | Biological.Replicates | Response | Verified |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20220128_PFAS_Clint_Hep12HLB_Sample003 | 012822 | Phenacetin | DTXSID1021116 | phenacetin | Blank | 480 | 1 | propranolol-d7 | 0.01 | 18,189.30 | 95.670 | LCMS | Unknown | 3.83 | HLB SPE Blank | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | 1 | 0.5 | 1 | 0.000052596859 | Y | ||
20220128_PFAS_Clint_Hep12WAX_Sample075 | 012822 | 4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid | DTXSID80380256 | TFMFPA | Cvst | 720 | 1 | M5PFPeA | 0.01 | 29,154.77 | 5.883 | LCMS | Unknown | 3.67 | WAX2 T15-B | Hep12 Data for Uncertainty Feb2022.xlsx | PFAS Data | 0.25 | 1 | 0.5 | 1 | 0.000002017852 | Wrong Mix | |
20220128_PFAS_Clint_Hep12HLB_Sample085 | 012822 | Propranolol | DTXSID6023525 | propranolol | CC | 240 | 1 | propranolol-d7 | 0.01 | 19,777.32 | 2,132.403 | LCMS | Unknown | 4.12 | HLB CC1 | Hep12 Data for Uncertainty Feb2022.xlsx | Ref Chem Data | 1 | 0.5 | 1 | 0.001078206466 | Unknown Conc. |
calc_clint_point
is the Level 3 function used to
calculate the Clint point estimate for each test compound
using a Frequentist framework.
Mathematically, Clint is the slope of the line representing log concentrations over time. With some assumptions on the form of the line and on the distribution of the error terms, we can estimate Clint using maximum likelihood estimation.
First, we assume the test compound concentration at time \(t\) is an exponential decay model \[C(t) = \textrm{cal}*C_0*e^{-mt}\] where \(\textrm{cal}\) is the MS calibration, \(C_0\) is the test compound concentration at time 0, \(m\) is the rate constant, and \(t\) is the incubation time. Notice that if a natural log is applied, \[\textrm{ln}(C(t)) = -mt + \textrm{ln}(\textrm{cal}*C_0)\] \(m\) is the slope of the line representing log concentrations over time. Thus, the rate constant \(m\) is our Clint value.
Next, we also assume the random noise is normally distributed \(\varepsilon_i \sim \mathcal{N}(0,\sigma^2)\) where \(\sigma^2\) is unknown.
Lastly, the test compound concentration, our dependent variable, is
qualitatively defined by our Response
column, the ratio of
the analyte area to the ISTD area.
With this information, we can rewrite our dependent variable as \[y_i \sim \mathcal{N}(C(t),\sigma^2)\] After some simplification, our log-likelihood is defined as \[l(\theta) = N*log\left(\frac{1}{\sqrt{2\pi}\sigma}\right)-\frac{1}{2}\sum_{i=1}^N\frac{(y_i-C(\theta, t))^2}{\sigma^2}\] where \(\theta = (\textrm{cal}, m)\) and \(N\) is the number of Cvst and Blank samples.
Using the mle
estimator in the stats4
package, we then estimate the parameters that maximize the
log-likelihood for each test compound.
clint_L3_curated <- calc_clint_point(FILENAME = "Clint_vignette",
data.in = clint_L2_curated,
# don't export output TSV file
output.res = FALSE)
#> [1] "Phenacetin Cl_int = 17.9 uL/min/million hepatocytes, p-Value = 9.52e-06 ."
#> [1] "Propranolol Cl_int = 9.15 uL/min/million hepatocytes, p-Value = 3.49e-06 ."
#> [1] "4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid Cl_int = 0 uL/min/million hepatocytes, p-Value = 1 ."
#> [1] "Intrinsic clearance (Clint) calculated for 3 chemicals."
#> [1] "Intrinsic clearance (Clint) calculated for 3 measurements."
Our Level 3 data contains a Clint
estimate,
Clint.pValue
, AIC
of the exponential decay
fit, and AIC.Null
of the exponential decay assuming a
constant rate of decay.
Some experiments also include the assay performed at 1 \(\mu M\) and 10 \(\mu M\) to check for the saturation of metabolizing enzymes. The test compound concentration at time \(t\) with a saturation probability is assumed to follow an exponential decay model with a slight modification \[C_{sat}(t) = \textrm{cal}*C_0*e^{-m*\textrm{sat}*t}\] where \(\textrm{cal}\) is the MS calibration, \(C_0\) is the test compound concentration at time 0, \(m\) is the rate constant, \(t\) is the incubation time, and \(\textrm{sat}\) is the probability of saturation, or the probability of observing a lower clearance rate at a higher concentration.
If these samples are included, Clint.1
is the clearance
rate at 1 \(\mu M\),
Clint.10
is the clearance rate at 10 \(\mu M\), AIC.Sat
is the AIC of
the exponential decay with a saturation probability, and
Sat.pValue
is the corresponding p-value.
Because our example data did not include these samples,
Clint.1
, Clint.10
, AIC.Sat
, and
Sat.pValue
are empty.
Compound.Name | DTXSID | Lab.Compound.Name | Calibration | Clint | Clint.pValue | Fit | AIC | AIC.Null | Clint.1 | Clint.10 | AIC.Sat | Sat.pValue |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Phenacetin | DTXSID1021116 | phenacetin | All Data | 17.856041 | 0.000009520723 | 0, 1 uM | 22.784013 | 34.34605 | ||||
Propranolol | DTXSID6023525 | propranolol | All Data | 9.149289 | 0.000003490784 | 0, 1 uM | 9.318565 | 21.88395 | ||||
4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid | DTXSID80380256 | TFMFPA | All Data | 0.000000 | 1.000000000000 | 0, 1 uM | 43.108541 | 41.10854 |
calc_clint
is the Level 4 function used to calculate
Clint point estimates and credible intervals using a Bayesian
framework. Markov chain Monte Carlo (MCMC) simulations are used to
randomly sample from the posterior distribution with a uniform
prior.
To run Level 4, one needs to have JAGS installed on their machine. To
determine the correct path, the user may use the
runjags::findjags()
or may need to specify the JAGS
installation location for the JAGS.PATH
argument.
JAGS is a software used for conducting Bayesian hierarchical modeling
using Markov Chain Monte Carlo (MCMC) simulation.
invitroTKstats
contains the JAGS models for each of the
applicable assays and utilizes the JAGS model under the hood to run the
MCMC simulations via the runjags
dependency to obtain the
level-4 Bayesian estimates.
JAGS.PATH
argument within the level-4 function.rjags
,
runjags
, and coda
R packages need to be
installed via the R console using the install.packages
function.We pass clint_L2_curated
, the Level 2 data frame, and
not clint_L3_curated
, the Level 3 data
frame, into calc_clint
. This is because Level 3 and Level 4
processing are not sequential; they are methods that calculate different
statistical quantities. The following code chunk takes a while to run;
previous runtimes are around 7 minutes.
clint_L4_curated <- calc_clint(FILENAME = "Clint_vignette",
data.in = clint_L2_curated,
JAGS.PATH = runjags::findjags()
)
The Clint intervals are returned to the user’s R session,
in an exported TSV file, and in an exported RData file. There is no
parameter to prevent the TSV or RData files from being exported because
of the potential for the simulations to crash. If there are no crashes,
then the exported TSV file is identical to the user’s R session and the
exported RData file. clint_L4
is an example exported RData
file.
Additionally, intermediate files are saved to the user’s current
working directory if TEMP.DIR = NULL
. These include a Level
2 heldout set, clint_L2_heldout
, containing unverified
samples and a Level 4 PREJAGS list, clint_PREJAGS
,
containing arguments provided to JAGS. Because the PREJAGS list is
overwritten with each compound, clint_PREJAGS
only contains
information relevant to the last tested compound, TFMFPA in this
case.
Our Level 4 data contains a Clint credible interval and the corresponding p-value for each chemical.
Compound.Name | DTXSID | Lab.Compound.Name | Clint.1.Med | Clint.1.Low | Clint.1.High | Clint.10.Med | Clint.10.Low | Clint.10.High | Clint.pValue | Sat.pValue | degrades.pValue |
---|---|---|---|---|---|---|---|---|---|---|---|
Phenacetin | DTXSID1021116 | phenacetin | 11.491000 | 7.390264 | 14.41591 | 0.00188 | 0.74930 | 0.99976 | |||
Propranolol | DTXSID6023525 | propranolol | 9.723517 | 6.714928 | 12.30937 | 0.00000 | 0.74876 | 0.99998 | |||
4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid | DTXSID80380256 | TFMFPA | 0.000000 | 0.000000 | 0.00000 | 0.99656 | 0.74626 | 0.99988 |
Generally, data processing pipelines should include minimal to no manual coding. It is best to keep clean code that is easily reproducible and transferable. The user should aim to have all the required data and meta-data files properly formatted to avoid further modifications throughout the pipeline.
In this section, we provide the equations for the Bayesian model (i.e., priors and likelihoods) used to estimate the intrinsic hepatic clearance (\(Cl_{int}\)) and the uncertainty about that estimate. The following sub-sections are organized such that:
Some of the indices are reused between sections (e.g. \(i\), \(w^*_i\), etc.). However, it should be noted that these are not meant to be understood across sub-sections; rather, only understood within the context of the section they are in.
NOTE: Readers unfamiliar with JAGS should be aware that JAGS software uses precision (\(\tau\)) rather than variance (\(\sigma^2\)) (i.e., \(\tau = \frac{1}{\sigma^2}\)).
Each chemical may have more than one day of experimentation, and thus
multiple calibrations. Suppose for the chemical of interest there are a
total of \(n_{cal}\) calibrations
(Num.cal
). For a particular calibration \(w \in (1,\ldots,n_{cal})\) we assume the
following priors for our Bayesian model:
Prior for log-scale constant analytic standard deviation
(log.const.analytic.sd
):
\[ log(\sigma_a)_w \sim \textrm{Unif} \left( a = -6,b = 1 \right) ; \space \sigma_{a,w} = 10^{log(\sigma_a)_w} \]
where \(a\) and \(b\) are the minimum and maximum,
respectively, and \(\sigma_{a,w}\) is
the converted parameter used in later equations
(const.analytic.sd
).
Prior for the log-scale heteroscedastic analytic slope
(log.hetero.analytic.slope
):
\[ log(m_h)_w \sim \textrm{Unif} \left( a = -6,b = 1 \right) ; \space m_{h,w} = 10^{log(m_h)_w} \]
where \(a\) and \(b\) are the minimum and maximum,
respectively, and \(m_{h,w}\) is the
converted parameter used in later equations
(hetero.analytic.slope
).
Prior for the threshold concentration
(C.thresh
):
\[ C_{thresh,w} \sim \textrm{Unif} \left( a = 0,b = \frac{conc_{target,w}}{10} \right)\]
where \(a\) and \(b\) are the minimum and maximum,
respectively, and \(conc_{target,w}\)
is the expected initial concentration (Test.Nominal.Conc
)
for calibration index \(w\).
Prior for the log-scale calibration
(log.calibration
):
\[ log(cal)_w \sim \textrm{N} \left( \mu = 0,\tau = 0.01 \right) ; \space cal_w = 10^{log(cal)_w}\]
where \(\mu\) and \(\tau\) are the mean and precision,
respectively, and \(cal_w\) is the
converted parameter used in later equations
(calibration
).
Prior for the background
(background
):
\[ \gamma_w \sim \textrm{Exp} \left( \lambda = 100 \right)\]
where \(\lambda\) is the rate parameter.
Let us assume the total number of prediction and precision for the
blank observations is based on the total number of calibrations in the
set (i.e. \(n_{cal}\),
Num.cal
). Thus, for each calibration \(j \in w\) we obtain the following:
Estimation for the predicted values for blank observations
(Blank.pred
):
\[ x_{blank,j} = \frac{\gamma_{j}}{df_{blank,j}} \]
where \(df_{blank,j}\) is the
dilution factor for the \(j^{th}\)
calibration for blank observations
(Blank.Dilution.Factor
).
Estimation for the precision of blank observations
(Blank.prec
):
\[ \tau_{blank,j} = \frac{1}{(\sigma_{a,j} + m_{h,j} *x_{blank,j})^2} \]
Likelihood for blank observations
(Blank.obs
):
Suppose \(n_{blank}\) is the total
number of blank observations (Num.blank.obs
), and \(y_{blank,i}\) indicates the \(i^{th}\) blank observation, sample type =
“Blank” (Blank.obs
). For each observation we obtain a
posterior MCMC estimate for the blank observations with the
following:
\[ y_{blank,i} \sim \textrm{N} \left( \mu = x_{blank,w^*_i}, \tau = \tau_{blank,w^*_i} \right) \]
where \(w^*_i\) is the calibration
index (Blank.cal
) for the \(i^{th}\) observation, such that \(w^* = (w^*_1,\ldots,w^*_{n_{blank}})\) and
\(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations).
Suppose \(n_{cc}\) is the total
number of calibration curve observations (Num.cc
), and
\(y_{cc,i}\) indicates the \(i^{th}\) calibration curve observation,
sample type = CC
(cc.obs
). For each
observation we obtain a posterior MCMC estimate for the calibration
curve observations with the following:
Calibration curve slope for the \(i^{th}\) observation
(cc.slope
) is assumed to be the calibration estimate
corresponding to the calibration index (\(w^*_i\)):
\[ m_{cc,i} = cal_{w^*_i}\]
where \(w^*_i\) is the calibration
index (cc.obs.cal
) for the \(i^{th}\) observation, such that \(w^* = (w^*_1,\ldots,w^*_{n_{cc}})\) and
\(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations).
Calibration curve intercept for the \(i^{th}\) observation
(cc.intercept
) is assumed to be the background estimate
corresponding to the calibration index (\(w^*_i\)):
\[ \alpha_{cc,i} = \gamma_{w^*_i}\]
Estimation of the predictions for calibration curve observations
(cc.pred
):
\[ x_{cc,i} = m_{cc,i} * (\frac{C_{cc,i}}{df_{cc,i}} - \frac{C_{thresh,w^*_i}}{df_{cc,i}}) * \beta_{cc,i} + \frac{\alpha_{cc,i}}{df_{cc,i}} \]
where
cc.obs.conc
) for the \(i^{th}\) calibration curve observationcc.obs.Dilution.Factor
) for the \(i^{th}\) calibration curve observation\[ \beta_{cc,i} = \begin{cases} 1 & \textrm{if} \space (\frac{C_{cc,i}}{df_{cc,i}} - \frac{C_{thresh,w^*_i}}{df_{cc,i}}) \ge 0 \\ 0 & \textrm{otherwise} \end{cases} \]
Estimation of the precision for calibration curve observations
(cc.prec
):
\[ \tau_{cc,i} = \frac{1}{(\sigma_{a,w^*_i} + m_{h,w^*_i} * x_{cc,i})^2} \]
Likelihood for the calibration curve observations
(cc.obs
):
\[ y_{cc,i} \sim \textrm{N} \left( \mu = x_{cc,i},\tau = \tau_{cc,i} \right)\]
In some cases, the intrinsic hepatic clearance assay will have two initial concentrations (i.e. lower and higher concentration), typically these are \(1\) and \(10 \mu M\) but do not necessarily need to be.
Prior for the binary indicator for whether the concentration
decreases (decreases
):
\[ d_c \sim \textrm{Bern} \left( p = p_{d_c} \right) \]
where \(p\) is the probability
parameter and \(p_{d_c}\) is the
user-specified prior probability that a chemical will decrease in the
assay (DECREASE.PROB
).
Prior for the binary indicator for whether the concentration degrades
(degrades
):
\[ d_g \sim \textrm{Bern} \left( p =
p_{d_g} \right) \] where \(p\)
is the probability parameter and \(p_{d_g}\) is the user-specified prior
probability of instability for a chemical within the assay (i.e. abiotic
degradation) (DEGRADE.PROB
).
Prior for the slope representing the biological clearance rate
(bio.rate
) for the lower concentration, indicated by \(C_{thresh,1}\) and \(conc_{target,1}\):
\[ \delta_{bio} \sim \textrm{Unif} \left( a = 0,b = -log(\frac{C_{thresh,1}}{conc_{target,1}}) \right)\]
where \(a\) and \(b\) are the minimum and maximum, respectively.
Prior for the slope representing the abiotic clearance rate, when
data from the inactivated hepatocytes is available
(abio.rate
) at the lower concentration, indicated by \(C_{thresh,1}\) and \(conc_{target,1}\):
\[ \delta_{abio} \sim \textrm{Unif} \left( a = 0,b = -log(\frac{C_{thresh,1}}{conc_{target,1}}) \right) \]
where \(a\) and \(b\) are the minimum and maximum, respectively.
Prior for the binary indicator for whether there is a reduction in
the biological clearance and the chemical shows some metabolic
saturation between the lower and higher concentration
(saturates
):
\[ s_f \sim \textrm{Bern} \left( p = p_{s_f} \right)\]
where \(p\) is the probability
parameter and \(p_{s_f}\) is the
user-specified prior probability that a chemical’s rate of metabolism
will decrease between the lower and higher concentration
(SATURATE.PROB
).
\(p_{s_f} = 0.25\) assumes \(25\%\) of the chemical shows some metabolic saturation between the lower and higher concentrations.
Prior for the saturation at the higher concentration (i.e. magnitude
of decreased biological clearance) at the higher concentration
(saturation
):
\[ s_a \sim \textrm{Unif} \left( a = 0,b = 1 \right) \]
Estimation of the actual biological clearance rate
(bio.slope
) at the lower (\(m_{bio,1}\)) and higher (\(m_{bio,2}\)) initial concentrations,
respectively:
\[ m_{bio,1} = d_c * \delta_{bio} \] \[ m_{bio,2} = d_c * \delta_{bio}* (1-s_f*s_a) \]
such that \(m_{bio} = (m_{bio,1},m_{bio,2})\).
The total elimination rate (slope
) at the lower (\(m_1\)) and higher (\(m_2\)) initial concentrations,
respectively:
\[ m_1 = d_c * \delta_{bio} + d_g * \delta_{abio} \]
\[ m_2 = d_g * \delta_{abio} + d_c * \delta_{bio} *(1-s_f*s_a) \]
such that \(m = (m_1,m_2)\).
Estimate of the abiotic clearance rate (abio.slope
):
\[ m_{abio} = d_g * \delta_{abio} \]
Suppose \(n\) is the total number of
observations (Num.obs
), and \(y_{i}\) indicates the \(i^{th}\) observation, sample type =
Cvst
(obs
). For each observation we obtain a
posterior MCMC estimate for the calibration curve observations with the
following:
Estimate of the metabolic exponential decay for the \(i^{th}\) observation (C
):
\[ C_i = conc_{target,c^*_i}*exp(-m_{c^*_i}*t_i) \]
where
obs.conc
) for the \(i^{th}\) observation (i.e, \(c^* = (c^*_1,\ldots,c^*_n)\)) and \(c^*_i \in (1,2)\) such that \(1\) indicates the lower initial
concentration and \(2\) indicates the
higher initial concentrationTest.Nominal.Conc
)
corresponding to the \(i^{th}\)
observation, given the concentration index \(c^*_i\)obs.time
) for the \(i^{th}\) observationEstimate of the metabolic clearance predictions
(obs.pred
):
\[ x_i = cal_{w^*_i} * (\frac{C_i}{df_i} - \frac{C_{thresh,w^*_i}}{df_i}) * \beta_i + \frac{\gamma_{w^*_i}}{df_i} \]
where
obs.Dilution.Factor
) for the \(i^{th}\) observation\[ \beta_i = \begin{cases} 1 & \textrm{if} \space (\frac{C_i}{df_i} - \frac{C_{thresh,w^*_i}}{df_i}) \ge 0 \\ 0 & \textrm{otherwise} \end{cases} \]
obs.cal
) for the \(i^{th}\) observation \(w^* = (w^*_1,\ldots,w^*_{n})\) and \(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations)Estimates of the metabolic clearance precision
(obs.prec
):
\[ \tau_i = \frac{1}{(\sigma_{a,w^*_i} + m_{h,w^*_i}*x_i)^2} \]
Likelihood for metabolic clearance
(obs
):
\[ y_i \sim \textrm{N} \left( \mu = x_i,\tau = \tau_i \right)\]
Suppose \(n_{abio}\) is the total
number of abiotic observations (Num.abio.obs
), and \(y_{abio,i}\) indicates the \(i^{th}\) observation, sample type =
Inactive
(abio.obs
). For each observation we
obtain a posterior MCMC estimate for the calibration curve observations
with the following:
Estimate of the abiotic exponential decay for the \(i^{th}\) observation
(abio.C
):
\[ C_{abio,i} = conc_{target,w^*_i} * exp(-m_{abio}*t_{abio,i}) \]
abio.obs.cal
) for the \(i^{th}\) non-biological observation \(w^* = (w^*_1,\ldots,w^*_{n_abio})\) and
\(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations)Test.Nominal.Conc
)
corresponding to the \(i^{th}\)
observation, given the calibration index \(w^*_i\)abio.obs.time
)Estimates for the abiotic clearance predictions
(abio.obs.pred
):
\[ x_{abio,i} = cal_{w^*_i} * (\frac{C_{abio,i}}{df_{abio,i}} - \frac{C_{thresh,w^*_i}}{df_{abio,i}}) * \beta_{abio,i} + \frac{\gamma_{w^*_i}}{df_{abio,i}} \]
where
abio.obs.Dilution.Factor
) for the \(i^{th}\) abiotic observation\[ \beta_{abio,i} = \begin{cases} 1 & \textrm{if} \space (\frac{C_{abio,i}}{df_{abio,i}} - \frac{C_{thresh,w^*_i}}{df_{abio,i}}) \ge 0 \\ 0 & \textrm{otherwise} \end{cases} \]
Estimates for the abiotic clearance precision
(abio.obs.prec
):
\[ \tau_{abio,i} = \frac{1}{(\sigma_{a,w^*_i} + m_{h,w^*_i} * x_{abio,i})^2} \]
Likelihood for the abiotic clearance observations
(abio.obs
):
\[ y_{abio,i} \sim \textrm{N} \left( \mu = x_{abio,i},\tau = \tau_{abio,i} \right) \]
Conversion of disappearance rates (1/hr) to hepatic clearance (\(\mu L/min/10^6\)) for \(1 \mu M\):
\[ \mathbf{Clint.1} = 1000*\frac{m_{bio,1 \mu M}}{\rho_{hep}*60} \]
where \(\rho_{hep}\) is the hepatic density and \(m_{bio,1 \mu M}\) is the slope for “bio” at \(1 \mu M\) (i.e. \(m_{bio,1} = m_{bio,1 \mu M}\) where \(m_{bio,1}\) is the notation in earlier equations in the Clearance Model section).
Conversion of disappearance rates (1/hr) to hepatic clearance (\(\mu L/min/10^6\)) for \(10 \mu M\):
\[ \mathbf{Clint.10} = 1000*\frac{m_{bio,10 \mu M}}{\rho_{hep}*60} \]
where \(\rho_{hep}\) is the hepatic density and \(m_{bio,10 \mu M}\) is the slope for “bio” at \(10 \mu M\) (i.e. \(m_{bio,2} = m_{bio,10 \mu M}\) where \(m_{bio,2}\) is the notation in earlier equations in the Clearance Model section).
Posterior probability for intrinsic hepatic clearance:
\[ \mathbf{Clint.pValue} = \frac{\sum_{k=1}^{n_{M}} d_{c,k} }{n_{M}}\]
where \(n_{M}\) is the number of MCMC iterations and \(d_{c,k}\) is the binary result for MCMC iteration \(k\) indicating whether the chemical decreases (i.e. is biologically cleared).
Posterior probability for the chemical saturation:
\[ \mathbf{Sat.pValue} = \frac{\sum_{k=1}^{n_{M}}s_{f,k}}{n_{M}} \]
where \(n_{M}\) is the number of MCMC iterations and \(s_{f,k}\) is the binary result for MCMC iteration \(k\) indicating whether the saturation is hit in each of the MCMC iterations.
Posterior probability of the chemical degradation:
\[ \mathbf{degrades.pValue} = \frac{\sum_{k=1}^{n_{M}}d_{g,k}}{n_{M}}\]
where \(n_{M}\) is the number of MCMC iterations and \(d_{g,k}\) is the binary result for MCMC iteration \(k\) indicating whether the chemical degradation is hit (i.e. is non-biologically cleared).
Data passed to JAGS as part of the PREJAGS
object:
Data | Notation | Description |
---|---|---|
Test.Nominal.Conc |
\(conc_{target}\) | expected initial concentration |
Num.cal |
\(n_{cal}\) | total number of calibrations |
Num.obs |
\(n\) | total number of Cvst samples |
obs |
\(y\) | Cvst sample responses |
obs.conc |
\(C\) | Test.Nominal.Conc indices for Cvst
samples |
obs.time |
\(t\) | time the Cvst samples were tested for a given chemical
concentration (in hours) |
obs.cal |
\(w^*\) | calibration index for Cvst samples (reused index) |
obs.Dilution.Factor |
\(df\) | dilution factors for Cvst samples |
Num.blank.obs |
\(n_{blank}\) | total number of Blank samples |
Blank.obs |
\(y_{blank}\) | Blank sample responses |
Blank.cal |
\(w^*\) | calibration index for Blank samples (reused index) |
Blank.Dilution.Factor |
\(df_{blank}\) | dilution factors for Blank samples |
Num.cc |
\(n_{cc}\) | total number of CC samples |
cc.obs.conc |
\(C_{cc}\) | Test.Compound.Conc for CC samples of the
tested compounds |
cc.obs |
\(y_{cc}\) | CC sample responses |
cc.obs.cal |
\(w^*\) | calibration index for CC samples (reused index) |
cc.obs.Dilution.Factor |
\(df_{cc}\) | dilution factors for CC samples |
Num.abio.obs |
\(n_{abio}\) | total number of Inactive samples |
abio.obs |
\(y_{abio}\) | Inactive sample responses |
abio.obs.conc |
\(C_{abio}\) | Test.Nominal.Conc indices for Inactive
samples |
abio.obs.time |
\(t_{abio}\) | time the Inactive samples were tested for a given
chemical concentration (in hours) |
abio.obs.cal |
\(w^*\) | calibration index for Inactive samples (reused
index) |
abio.obs.Dilution.Factor |
\(df_{abio}\) | dilution factors for Inactive samples |
User-specified prior parameter values via level-4 function arguments
and passed to JAGS as part of the PREJAGS
object:
Prior | Notation | level-4 Argument (Default) | Description |
---|---|---|---|
DECREASE.PROB |
\(p_{d_c}\) | decrease.prob (0.5) |
probability that a chemical will decrease in the assay |
DEGRADE.PROB |
\(p_{d_g}\) | degrade.prob (0.05) |
probability of instability for a chemical within the assay (i.e. abiotic degradation) |
SATURATE.PROB |
\(p_{s_f}\) | saturate.prob (0.25) |
probability that a chemical’s rate of metabolism will decrease between the lower and higher concentration |
MCMC Parameters in JAGS:
JAGS Parameter Name | Parameter | Distribution | Prior/Posterior/Calculated |
---|---|---|---|
log.const.analytic.sd |
\(log(\sigma_a)\) | Uniform | Prior |
log.hetero.analytic.slope |
\(log(m_h)\) | Uniform | Prior |
C.thresh |
\(C_{thresh}\) | Uniform | Prior |
log.calibration |
\(log(cal)\) | Normal | Prior |
background |
\(\gamma\) | Exponential | Prior |
const.analytic.sd |
\(\sigma_a\) | Calculated | |
hetero.analytic.slope |
\(m_h\) | Calculated | |
calibration |
\(cal\) | Calculated | |
Blank.pred |
\(x_{blank}\) | Calculated | |
Blank.prec |
\(\tau_{blank}\) | Calculated | |
Blank.obs |
\(y_{blank}\) | Normal | Posterior |
cc.slope |
\(m_{cc}\) | Calculated | |
cc.intercept |
\(\alpha_{cc}\) | Calculated | |
cc.pred |
\(x_{cc}\) | Calculated | |
cc.prec |
\(\tau_{cc}\) | Calculated | |
cc.obs |
\(y_{cc}\) | Normal | Posterior |
decreases |
\(d_c\) | Bernoulli | Prior |
degrades |
\(d_g\) | Bernoulli | Prior |
bio.rate |
\(\delta_{bio}\) | Uniform | Prior |
abio.rate |
\(\delta_{abio}\) | Uniform | Prior |
slope |
\(m\) | Calculated | |
bio.slope |
\(m_{bio}\) | Calculated | |
saturates |
\(s_f\) | Bernoulli | Prior |
saturation |
\(s_a\) | Uniform | Prior |
C |
\(C\) | Calculated | |
obs.pred |
\(x\) | Calculated | |
obs.prec |
\(\tau\) | Calculated (Not the general distribution parameter.) | |
obs |
\(y\) | Normal | Posterior |
abio.slope |
\(m_{abio}\) | Calculated | |
abio.C |
\(C_{abio}\) | Calculated | |
abio.obs.pred |
\(x_{abio}\) | Calculated | |
abio.obs.prec |
\(\tau_{abio}\) | Calculated | |
abio.obs |
\(y_{abio}\) | Normal | Posterior |