Intrinsic Hepatic Clearance (Clint)

US EPA’s Center for Computational Toxicology and Exposure ccte@epa.gov

Introduction

This vignette guides users on how to estimate intrinsic hepatic clearance (Clint) from mass spectrometry data. Intrinsic hepatic clearance is a chemical specific parameter that describes a liver’s ability to metabolize an unbound compound. It is characterized by the disappearance of compound when incubated with primary hepatocytes (Breen et al. (2021)).

The mass spectrometry data should be collected from an assay that uses cryopreserved hepatocyte suspension (Shibata et al. (2002)) as seen in Figure 1.

Fig 1: Clint experimental set up

Fig 1: Cl~int~ experimental set up

Suggested packages for use with this vignette

# Primary package 
library(invitroTKstats)
# Data manipulation package 
library(dplyr)
# Table formatting package 
library(flextable)

Load Data

First, we load in the example dataset from invitroTKstats.

# Load example clint data 
data("Clint-example")

Many datasets are loaded in: clint_L0, clint_L1,clint_L2, clint_L3, and clint_L4. These datasets are Clint data at Level 0, 1, 2, 3, and 4 respectively. Additional datasets associated with Level 4 processing are also loaded in: clint_L2_heldout and clint_PREJAGS. These will be described later in the “Level 4 processing” section. Lastly, a clint_cheminfo dataset is loaded in that contains chemical information necessary for identification mapping; it is used to create Level 0 data. For the purpose of this vignette, we’ll start with clint_L0, the Level 0 data, to demonstrate the complete pipelining process.

clint_L0 is the output from the merge_level0 function which compiles raw lab data from specified Excel files into a singular data frame. The data frame contains exactly one row per sample with information obtained from the mass spectrometer. For more details on curating raw lab data to a singular Level 0 data frame, see the “Data Guide Creation and Level-0 Data Compilation” vignette.

The following table displays the first three rows of clint_L0, our Level 0 data.

Table 1: Level 0 data

Compound

DTXSID

Lab.Compound.ID

Date

Sample

Type

Compound.Conc

Peak.Area

ISTD.Peak.Area

ISTD.Name

Analysis.Params

Level0.File

Level0.Sheet

Sample Text

Time

Dilution.Factor

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample003

Blank

95.670

18,189.30

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB SPE Blank

480

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample004

Blank

254.218

17,998.58

propranolol-d7

3.86

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB SPE Matrix Blank

480

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample017

CC

0.0060456

56.500

18,123.62

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB CC1

240

Level 1 processing

format_clint is the Level 1 function used to create a standardized data frame. This level of processing is necessary because naming conventions or formatting can differ across data sets.

If the Level 0 data already contains the required column, then the existing column name can be specified. For example, clint_L0 already contains a column specifying the sample name called “Sample”. However, the default column name for sample name is “Lab.Sample.Name”. Therefore, we specify the correct column with sample.col = "Sample". In general, to specify an already existing column that differs from the default, the user must use the parameter with the .col suffix.

If the Level 0 data does not already contain the required column, then the entire column can be populated with a single value. For example, clint_L0 does not contain a column specifying biological replicates. Therefore, we populate the required column with biological.replicates = 1. In general, to specify a single value for an entire column, the user must use the parameter without the .col suffix.

Users should be mindful if they choose to specify a single value for all of their samples; they should verify this action is one they wish to take.

Some columns must be present in the Level 0 data while others can be filled with a single value. At minimum, the following columns must be present in the Level 0 data and specification with a single entry is not permitted: sample.col, compound.col, dtxsid.col, lab.compound.col, type.col, istd.col, and area.col.

If there is no additional note.col in the Level 0 data, users should use note.col = NULL to fill the column with “Note”.

The rest of the following columns may either be specified from the Level 0 data or filled with a single value: date.col or date, density.col or density, cal.col or cal, dilution.col or dilution, time.col or time, istd.name.col or istd.name, istd.conc.col or istd.conc, test.conc.col or test.conc, test.nominal.conc.col or test.nominal.conc, biological.replicates.col or biological.replicates, analysis.method.col or analysis.method, analysis.instrument.col or analysis.instrument, analysis.parameters.col or analysis.parameters, level0.file.col or level0.file, and level0.sheet.col or level0.sheet.

Table 2: Level 1 'format_clint' function arguments

Argument

Default

Required in L0?

Corresp. single-entry Argument

Descr.

FILENAME

MYDATA

N/A

Output and input filename

data.in

N/A

Level 0 data frame

sample.col

Lab.Sample.Name

Y

Lab sample name

date.col

Date

N

date

Lab measurement date

compound.col

Compound.Name

Y

Formal test compound name

dtxsid.col

DTXSID

Y

EPA's DSSTox Structure ID

lab.compound.col

Lab.Compound.Name

Y

Lab test compound name (abbr.)

type.col

Sample.Type

Y

Sample type (Blank/Cvst/Inactive/CC)

density.col

Hep.Density

N

density

Hepatocyte density in the in vitro incubation

cal.col

Cal

N

cal

MS calibration

dilution.col

Dilution.Factor

N

dilution

Number of times sample was diluted

time.col

Time

N

time

Time since chemical was introduced

istd.col

ISTD.Area

Y

Internal standard peak area

istd.name.col

ISTD.Name

N

istd.name

Internal standard name

istd.conc.col

ISTD.Conc

N

istd.conc

Internal standard concentration

test.conc.col

Test.Compound.Conc

N

test.conc

Standard test chemical concentration

test.nominal.conc.col

Test.Target.Conc

N

test.nominal.conc

Initial chemical concentration

area.col

Area

Y

Target analyte peak area

biological.replicates.col

Biological.Replicates

N

biological.replicates

Replicates with the same analyte

technical.replicates.col

Technical.Replicates

N

technical.replicates

Repeated measurements from one sample

analysis.method.col

Analysis.Method

N

analysis.method

Analytical chemistry analysis method

analysis.instrument.col

Analysis.Instrument

N

analysis.instrument

Analytical chemistry analysis instrument

analysis.parameters.col

Analysis.Parameters

N

analysis.parameters

Analytical chemistry analysis parameters

note.col

Note

N

Additional notes

level0.file.col

Level0.File

N

level0.file

Raw data filename

level0.sheet.col

Level0.Sheet

N

level0.sheet

Raw data sheet name

output.res

FALSE

N/A

Export results (TSV)?

save.bad.types

FALSE

N/A

Export bad data (TSV)?

sig.figs

5

N/A

Number of significant figures

INPUT.DIR

N/A

Input directory of Level 0 file

OUTPUT.DIR

N/A

Export directory to save Level 1 files

A TSV file containing the level-1 data can be exported to the user’s per-session temporary directory. This temporary directory is a per-session directory whose path can be found with the following code: tempdir(). For more details, see [https://www.collinberke.com/til/posts/2023-10-24-temp-directories/].

To avoid exporting to this temporary directory, an OUTPUT.DIR must be specified. We have omitted this export entirely with output.res = FALSE (the default). The option to omit exporting a TSV file is also available at levels 2 and 3 and will be used from this point forward.

clint_L1_curated <- format_clint(FILENAME = "Clint_vignette", 
                         data.in = clint_L0,
                         # columns present in L0 data 
                         sample.col = "Sample",
                         compound.col = "Compound",
                         lab.compound.col = "Lab.Compound.ID",
                         type.col = "Type",
                         istd.col = "ISTD.Peak.Area",
                         area.col = "Peak.Area",
                         note.col = "Sample Text",
                         test.conc.col = "Compound.Conc",
                         analysis.parameters.col = "Analysis.Params",
                         # columns not present in L0 data
                         density = 0.5,
                         test.nominal.conc = 1,
                         analysis.instrument = "Unknown",
                         analysis.method = "LCMS",
                         istd.conc = 0.01,
                         cal = 1,
                         biological.replicates = 1,
                         # don't export output TSV file
                         output.res = FALSE
                         )
#> Warning in format_clint(FILENAME = "Clint_vignette", data.in = clint_L0, : Data
#> with inappropriate sample types were removed.
#> 229 observations of 3 chemicals based on 3 separate measurements (calibrations).

We receive a warning message that some of our samples have been removed due to “inappropriate sample types”. The removed samples include data that did not have one of the following sample types

  1. Blank with no chemical added (Blank)
  2. Hepatocyte incubation concentration (Cvst)
  3. Inactivated Hepatocytes (Inactive)
  4. Calibration Curve (CC)

The following table displays some of the samples that were removed. These samples are annotated with a “QC” sample type indicating “Quality Control” and should therefore not be included in our analysis.

Table 3: Subset of removed samples

Compound

DTXSID

Lab.Compound.ID

Date

Sample

Type

Compound.Conc

Peak.Area

ISTD.Peak.Area

ISTD.Name

Analysis.Params

Level0.File

Level0.Sheet

Sample Text

Time

Dilution.Factor

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample037

QC

932.885

17,947.17

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB QC-B

480

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample038

QC

2,049.694

17,701.59

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB QC-C

480

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample039

QC

7,152.603

18,042.49

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB QC-D

480

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample105

QC

984.653

18,864.06

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB QC-B

480

Phenacetin

DTXSID1021116

phenacetin

012822

20220128_PFAS_Clint_Hep12HLB_Sample106

QC

1,756.444

18,264.91

propranolol-d7

3.83

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

HLB QC-C

480

Users can verify that these samples are excluded by filtering their Level 1 data frame and ensuring no matches are found. One approach is given below.

# All the samples of an inappropriate sample type 
excluded <- clint_L0 %>% 
  filter(!Type %in% c("Blank", "Cvst", "Inactive", "CC"))

# Exclude based on Sample and DTXSID
X <-  c(excluded$Sample)
names(X) <- paste(excluded$Sample, excluded$DTXSID, sep = "+")
Y <- c(excluded$DTXSID)

# Find samples in L1 data frame with matching sample name (X) and DTXSID (Y)
matches <- as.data.frame(t(mapply(function(X,Y) 
  {subset(clint_L1_curated, Lab.Sample.Name == X & DTXSID == Y)},
       X, Y, USE.NAMES = T)))

matches

# Check that no matches were returned for each sample in `excluded`
check <-  rep(1,nrow(matches))
names(check) <- rownames(matches)
for (name in rownames(matches)) {
  # If no matches were found, each element of check should evaluate to 0
  check[name] <- sum(sapply(c(1:ncol(matches)),
                           function(X) {length(matches[1,X][[1]])}))
}

# Verify that each sample in `excluded` had no matches
check 

# If no matches, should evaluate to 0
sum(check)

To export the removed samples types as a TSV, the user can set the parameter save.bad.types = TRUE. For this example, we will not save the discarded sample types.

The following table displays the first three rows of clint_L1_curated, our Level 1 data produced from format_clint. In addition to the columns specified by the user, there is an additional column called Response. This column is the test compound concentration and is calculated as \(\textrm{Response} = \frac{\textrm{Analyte Area}}{\textrm{ISTD Area}}*\textrm{ISTD Conc}\) where \(\textrm{Analyte Area}\) is defined by the Area column, \(\textrm{ISTD Area}\) is defined by the ISTD.Area column, and \(\textrm{ISTD Conc}\) is defined by the ISTD.Conc column.

Table 4: Level 1 data

Lab.Sample.Name

Date

Compound.Name

DTXSID

Lab.Compound.Name

Sample.Type

Dilution.Factor

Calibration

ISTD.Name

ISTD.Conc

ISTD.Area

Area

Analysis.Method

Analysis.Instrument

Analysis.Parameters

Note

Level0.File

Level0.Sheet

Time

Test.Compound.Conc

Test.Nominal.Conc

Hep.Density

Biological.Replicates

Response

20220128_PFAS_Clint_Hep12HLB_Sample003

012822

Phenacetin

DTXSID1021116

phenacetin

Blank

480

1

propranolol-d7

0.01

18,189.30

95.670

LCMS

Unknown

3.83

HLB SPE Blank

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

1

0.5

1

0.00005259686

20220128_PFAS_Clint_Hep12HLB_Sample004

012822

Phenacetin

DTXSID1021116

phenacetin

Blank

480

1

propranolol-d7

0.01

17,998.58

254.218

LCMS

Unknown

3.86

HLB SPE Matrix Blank

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

1

0.5

1

0.00014124340

20220128_PFAS_Clint_Hep12HLB_Sample017

012822

Phenacetin

DTXSID1021116

phenacetin

CC

240

1

propranolol-d7

0.01

18,123.62

56.500

LCMS

Unknown

3.83

HLB CC1

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

0.0060456

1

0.5

1

0.00003117478

Level 2 processing

sample_verification is the Level 2 function used to add a verification column. The verification column indicates whether a sample should be included in the point estimation (Level 3) and credible interval (Level 4) processing. This column allows users to keep all samples in their data but only utilize the reliable samples for Clint estimation. All of the data in Level 2 is identical to the data in Level 1 with the exception of the additional Verified column.

To determine whether a sample should be included, the user should consult the wet-lab scientists from where their data originates or a chemist who may be able to provide reliable rationale for samples that should not be verified. This level of processing allows the user to receive feedback from the wet-lab scientists, exclude erroneous or unreliable samples, and produce new Clint estimates. Thus, there is always an open channel of communication between the user and the wet-lab scientists or chemists.

We will use the already processed Level 2 data frame, clint_L2, to regenerate our exclusion data. In general, the user would not have access to the exclusion information a priori.

The exclusion data frame must include the following columns: Variables, Values, and Message. The Variables column contains the variable names used to filter the excluded rows. Here, we are using Lab.Sample.Name and DTXSID to identify the excluded rows separated by a “|”. The Values column contains the values of the variables, as a character, also separated by a “|”. The Message column contains the reason for exclusion. Here, we are using the reasons listed in the Verified column in clint_L2. The user should refrain from using “|” in any of their descriptions to avoid conflicts with the sample_verification function.

# Use verification data from loaded in `clint_L2` data frame
exclusion <- clint_L2 %>% 
  filter(Verified %in% c("Wrong Mix", "Unknown Conc.")) %>% 
  mutate("Variables" = "Lab.Sample.Name|DTXSID") %>% 
  mutate("Values" = paste(Lab.Sample.Name, DTXSID, sep = "|")) %>% 
  mutate("Message" = Verified) %>% 
  select(Variables, Values, Message)
Table 5: Exclusion data frame

Variables

Values

Message

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12HLB_Sample085|DTXSID6023525

Unknown Conc.

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample017|DTXSID80380256

Unknown Conc.

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample018|DTXSID80380256

Unknown Conc.

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample019|DTXSID80380256

Unknown Conc.

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample075|DTXSID80380256

Wrong Mix

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample109|DTXSID80380256

Wrong Mix

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample110|DTXSID80380256

Wrong Mix

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample111|DTXSID80380256

Wrong Mix

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample114|DTXSID80380256

Wrong Mix

Lab.Sample.Name|DTXSID

20220128_PFAS_Clint_Hep12WAX_Sample115|DTXSID80380256

Wrong Mix

clint_L2_curated <- sample_verification(FILENAME = "Clint_vignette",
                                     data.in = clint_L1_curated,
                                     assay = "Clint",
                                     exclusion.info = exclusion,
                                     # don't export output TSV file 
                                     output.res = FALSE)

Our Level 2 data now contains a Verified column. If the sample should be included, the column contains a “Y” for yes. If the sample should be excluded, the column contains the reason for exclusion.

The following table displays some rows of the Level 2 data, each with a different value in the Verification column. Note, the reasons for exclusion are not limited to “Unknown Conc.” or “Wrong Mix”, they were merely provided as an example.

Table 6: Level 2 data

Lab.Sample.Name

Date

Compound.Name

DTXSID

Lab.Compound.Name

Sample.Type

Dilution.Factor

Calibration

ISTD.Name

ISTD.Conc

ISTD.Area

Area

Analysis.Method

Analysis.Instrument

Analysis.Parameters

Note

Level0.File

Level0.Sheet

Time

Test.Compound.Conc

Test.Nominal.Conc

Hep.Density

Biological.Replicates

Response

Verified

20220128_PFAS_Clint_Hep12HLB_Sample003

012822

Phenacetin

DTXSID1021116

phenacetin

Blank

480

1

propranolol-d7

0.01

18,189.30

95.670

LCMS

Unknown

3.83

HLB SPE Blank

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

1

0.5

1

0.000052596859

Y

20220128_PFAS_Clint_Hep12WAX_Sample075

012822

4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid

DTXSID80380256

TFMFPA

Cvst

720

1

M5PFPeA

0.01

29,154.77

5.883

LCMS

Unknown

3.67

WAX2 T15-B

Hep12 Data for Uncertainty Feb2022.xlsx

PFAS Data

0.25

1

0.5

1

0.000002017852

Wrong Mix

20220128_PFAS_Clint_Hep12HLB_Sample085

012822

Propranolol

DTXSID6023525

propranolol

CC

240

1

propranolol-d7

0.01

19,777.32

2,132.403

LCMS

Unknown

4.12

HLB CC1

Hep12 Data for Uncertainty Feb2022.xlsx

Ref Chem Data

1

0.5

1

0.001078206466

Unknown Conc.

Level 3 processing

calc_clint_point is the Level 3 function used to calculate the Clint point estimate for each test compound using a Frequentist framework.

Mathematically, Clint is the slope of the line representing log concentrations over time. With some assumptions on the form of the line and on the distribution of the error terms, we can estimate Clint using maximum likelihood estimation.

First, we assume the test compound concentration at time \(t\) is an exponential decay model \[C(t) = \textrm{cal}*C_0*e^{-mt}\] where \(\textrm{cal}\) is the MS calibration, \(C_0\) is the test compound concentration at time 0, \(m\) is the rate constant, and \(t\) is the incubation time. Notice that if a natural log is applied, \[\textrm{ln}(C(t)) = -mt + \textrm{ln}(\textrm{cal}*C_0)\] \(m\) is the slope of the line representing log concentrations over time. Thus, the rate constant \(m\) is our Clint value.

Next, we also assume the random noise is normally distributed \(\varepsilon_i \sim \mathcal{N}(0,\sigma^2)\) where \(\sigma^2\) is unknown.

Lastly, the test compound concentration, our dependent variable, is qualitatively defined by our Response column, the ratio of the analyte area to the ISTD area.

With this information, we can rewrite our dependent variable as \[y_i \sim \mathcal{N}(C(t),\sigma^2)\] After some simplification, our log-likelihood is defined as \[l(\theta) = N*log\left(\frac{1}{\sqrt{2\pi}\sigma}\right)-\frac{1}{2}\sum_{i=1}^N\frac{(y_i-C(\theta, t))^2}{\sigma^2}\] where \(\theta = (\textrm{cal}, m)\) and \(N\) is the number of Cvst and Blank samples.

Using the mle estimator in the stats4 package, we then estimate the parameters that maximize the log-likelihood for each test compound.

clint_L3_curated <- calc_clint_point(FILENAME = "Clint_vignette",
                             data.in = clint_L2_curated,
                             # don't export output TSV file
                             output.res = FALSE)
#> [1] "Phenacetin Cl_int = 17.9 uL/min/million hepatocytes, p-Value = 9.52e-06 ."
#> [1] "Propranolol Cl_int = 9.15 uL/min/million hepatocytes, p-Value = 3.49e-06 ."
#> [1] "4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid Cl_int = 0 uL/min/million hepatocytes, p-Value = 1 ."
#> [1] "Intrinsic clearance (Clint) calculated for 3 chemicals."
#> [1] "Intrinsic clearance (Clint) calculated for 3 measurements."

Our Level 3 data contains a Clint estimate, Clint.pValue, AIC of the exponential decay fit, and AIC.Null of the exponential decay assuming a constant rate of decay.

Some experiments also include the assay performed at 1 \(\mu M\) and 10 \(\mu M\) to check for the saturation of metabolizing enzymes. The test compound concentration at time \(t\) with a saturation probability is assumed to follow an exponential decay model with a slight modification \[C_{sat}(t) = \textrm{cal}*C_0*e^{-m*\textrm{sat}*t}\] where \(\textrm{cal}\) is the MS calibration, \(C_0\) is the test compound concentration at time 0, \(m\) is the rate constant, \(t\) is the incubation time, and \(\textrm{sat}\) is the probability of saturation, or the probability of observing a lower clearance rate at a higher concentration.

If these samples are included, Clint.1 is the clearance rate at 1 \(\mu M\), Clint.10 is the clearance rate at 10 \(\mu M\), AIC.Sat is the AIC of the exponential decay with a saturation probability, and Sat.pValue is the corresponding p-value.

Because our example data did not include these samples, Clint.1, Clint.10, AIC.Sat, and Sat.pValue are empty.

Table 7: Level 3 data

Compound.Name

DTXSID

Lab.Compound.Name

Calibration

Clint

Clint.pValue

Fit

AIC

AIC.Null

Clint.1

Clint.10

AIC.Sat

Sat.pValue

Phenacetin

DTXSID1021116

phenacetin

All Data

17.856041

0.000009520723

0, 1 uM

22.784013

34.34605

Propranolol

DTXSID6023525

propranolol

All Data

9.149289

0.000003490784

0, 1 uM

9.318565

21.88395

4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid

DTXSID80380256

TFMFPA

All Data

0.000000

1.000000000000

0, 1 uM

43.108541

41.10854

Level 4 processing

calc_clint is the Level 4 function used to calculate Clint point estimates and credible intervals using a Bayesian framework. Markov chain Monte Carlo (MCMC) simulations are used to randomly sample from the posterior distribution with a uniform prior.

To run Level 4, one needs to have JAGS installed on their machine. To determine the correct path, the user may use the runjags::findjags() or may need to specify the JAGS installation location for the JAGS.PATH argument.

JAGS is a software used for conducting Bayesian hierarchical modeling using Markov Chain Monte Carlo (MCMC) simulation. invitroTKstats contains the JAGS models for each of the applicable assays and utilizes the JAGS model under the hood to run the MCMC simulations via the runjags dependency to obtain the level-4 Bayesian estimates.

  1. Download and install the “Just Another Gibbs Sampler” (JAGS) software, which is freely available here.
  2. Once JAGS is installed one will need to identify the installation location, which is likely to be in a system location (e.g. “Program Files”). Users may need this location to specify the JAGS.PATH argument within the level-4 function.
  3. Finally, if not already installed, the rjags, runjags, and coda R packages need to be installed via the R console using the install.packages function.

We pass clint_L2_curated, the Level 2 data frame, and not clint_L3_curated, the Level 3 data frame, into calc_clint. This is because Level 3 and Level 4 processing are not sequential; they are methods that calculate different statistical quantities. The following code chunk takes a while to run; previous runtimes are around 7 minutes.

clint_L4_curated <- calc_clint(FILENAME = "Clint_vignette",
                       data.in = clint_L2_curated, 
                       JAGS.PATH = runjags::findjags()
                       )

The Clint intervals are returned to the user’s R session, in an exported TSV file, and in an exported RData file. There is no parameter to prevent the TSV or RData files from being exported because of the potential for the simulations to crash. If there are no crashes, then the exported TSV file is identical to the user’s R session and the exported RData file. clint_L4 is an example exported RData file.

Additionally, intermediate files are saved to the user’s current working directory if TEMP.DIR = NULL. These include a Level 2 heldout set, clint_L2_heldout, containing unverified samples and a Level 4 PREJAGS list, clint_PREJAGS, containing arguments provided to JAGS. Because the PREJAGS list is overwritten with each compound, clint_PREJAGS only contains information relevant to the last tested compound, TFMFPA in this case.

Our Level 4 data contains a Clint credible interval and the corresponding p-value for each chemical.

Table 8: Level 4 data

Compound.Name

DTXSID

Lab.Compound.Name

Clint.1.Med

Clint.1.Low

Clint.1.High

Clint.10.Med

Clint.10.Low

Clint.10.High

Clint.pValue

Sat.pValue

degrades.pValue

Phenacetin

DTXSID1021116

phenacetin

11.491000

7.390264

14.41591

0.00188

0.74930

0.99976

Propranolol

DTXSID6023525

propranolol

9.723517

6.714928

12.30937

0.00000

0.74876

0.99998

4,4-bis(Trifluoromethyl)-4-fluoropropanoic acid

DTXSID80380256

TFMFPA

0.000000

0.000000

0.00000

0.99656

0.74626

0.99988

Best Practices: Food for thought

Generally, data processing pipelines should include minimal to no manual coding. It is best to keep clean code that is easily reproducible and transferable. The user should aim to have all the required data and meta-data files properly formatted to avoid further modifications throughout the pipeline.

Appendix

L4 Bayesian Modeling Information

In this section, we provide the equations for the Bayesian model (i.e., priors and likelihoods) used to estimate the intrinsic hepatic clearance (\(Cl_{int}\)) and the uncertainty about that estimate. The following sub-sections are organized such that:

Some of the indices are reused between sections (e.g. \(i\), \(w^*_i\), etc.). However, it should be noted that these are not meant to be understood across sub-sections; rather, only understood within the context of the section they are in.

NOTE: Readers unfamiliar with JAGS should be aware that JAGS software uses precision (\(\tau\)) rather than variance (\(\sigma^2\)) (i.e., \(\tau = \frac{1}{\sigma^2}\)).

Measurement Model

Each chemical may have more than one day of experimentation, and thus multiple calibrations. Suppose for the chemical of interest there are a total of \(n_{cal}\) calibrations (Num.cal). For a particular calibration \(w \in (1,\ldots,n_{cal})\) we assume the following priors for our Bayesian model:

Prior for log-scale constant analytic standard deviation (log.const.analytic.sd):

\[ log(\sigma_a)_w \sim \textrm{Unif} \left( a = -6,b = 1 \right) ; \space \sigma_{a,w} = 10^{log(\sigma_a)_w} \]

where \(a\) and \(b\) are the minimum and maximum, respectively, and \(\sigma_{a,w}\) is the converted parameter used in later equations (const.analytic.sd).

Prior for the log-scale heteroscedastic analytic slope (log.hetero.analytic.slope):

\[ log(m_h)_w \sim \textrm{Unif} \left( a = -6,b = 1 \right) ; \space m_{h,w} = 10^{log(m_h)_w} \]

where \(a\) and \(b\) are the minimum and maximum, respectively, and \(m_{h,w}\) is the converted parameter used in later equations (hetero.analytic.slope).

Prior for the threshold concentration (C.thresh):

\[ C_{thresh,w} \sim \textrm{Unif} \left( a = 0,b = \frac{conc_{target,w}}{10} \right)\]

where \(a\) and \(b\) are the minimum and maximum, respectively, and \(conc_{target,w}\) is the expected initial concentration (Test.Nominal.Conc) for calibration index \(w\).

Prior for the log-scale calibration (log.calibration):

\[ log(cal)_w \sim \textrm{N} \left( \mu = 0,\tau = 0.01 \right) ; \space cal_w = 10^{log(cal)_w}\]

where \(\mu\) and \(\tau\) are the mean and precision, respectively, and \(cal_w\) is the converted parameter used in later equations (calibration).

Prior for the background (background):

\[ \gamma_w \sim \textrm{Exp} \left( \lambda = 100 \right)\]

where \(\lambda\) is the rate parameter.

Estimation of Blanks

Let us assume the total number of prediction and precision for the blank observations is based on the total number of calibrations in the set (i.e. \(n_{cal}\), Num.cal). Thus, for each calibration \(j \in w\) we obtain the following:

Estimation for the predicted values for blank observations (Blank.pred):

\[ x_{blank,j} = \frac{\gamma_{j}}{df_{blank,j}} \]

where \(df_{blank,j}\) is the dilution factor for the \(j^{th}\) calibration for blank observations (Blank.Dilution.Factor).

Estimation for the precision of blank observations (Blank.prec):

\[ \tau_{blank,j} = \frac{1}{(\sigma_{a,j} + m_{h,j} *x_{blank,j})^2} \]

Likelihood for blank observations (Blank.obs):

Suppose \(n_{blank}\) is the total number of blank observations (Num.blank.obs), and \(y_{blank,i}\) indicates the \(i^{th}\) blank observation, sample type = “Blank” (Blank.obs). For each observation we obtain a posterior MCMC estimate for the blank observations with the following:

\[ y_{blank,i} \sim \textrm{N} \left( \mu = x_{blank,w^*_i}, \tau = \tau_{blank,w^*_i} \right) \]

where \(w^*_i\) is the calibration index (Blank.cal) for the \(i^{th}\) observation, such that \(w^* = (w^*_1,\ldots,w^*_{n_{blank}})\) and \(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations).

Estimation of Calibration Curve

Suppose \(n_{cc}\) is the total number of calibration curve observations (Num.cc), and \(y_{cc,i}\) indicates the \(i^{th}\) calibration curve observation, sample type = CC (cc.obs). For each observation we obtain a posterior MCMC estimate for the calibration curve observations with the following:

Calibration curve slope for the \(i^{th}\) observation (cc.slope) is assumed to be the calibration estimate corresponding to the calibration index (\(w^*_i\)):

\[ m_{cc,i} = cal_{w^*_i}\]

where \(w^*_i\) is the calibration index (cc.obs.cal) for the \(i^{th}\) observation, such that \(w^* = (w^*_1,\ldots,w^*_{n_{cc}})\) and \(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations).

Calibration curve intercept for the \(i^{th}\) observation (cc.intercept) is assumed to be the background estimate corresponding to the calibration index (\(w^*_i\)):

\[ \alpha_{cc,i} = \gamma_{w^*_i}\]

Estimation of the predictions for calibration curve observations (cc.pred):

\[ x_{cc,i} = m_{cc,i} * (\frac{C_{cc,i}}{df_{cc,i}} - \frac{C_{thresh,w^*_i}}{df_{cc,i}}) * \beta_{cc,i} + \frac{\alpha_{cc,i}}{df_{cc,i}} \]

where

  • \(C_{cc,i}\) is the chemical concentration (cc.obs.conc) for the \(i^{th}\) calibration curve observation
  • \(df_{cc,i}\) is the dilution factor (cc.obs.Dilution.Factor) for the \(i^{th}\) calibration curve observation
  • \(\beta_{cc,i}\) is the JAGS step function for the \(i^{th}\) calibration curve observation, such that

\[ \beta_{cc,i} = \begin{cases} 1 & \textrm{if} \space (\frac{C_{cc,i}}{df_{cc,i}} - \frac{C_{thresh,w^*_i}}{df_{cc,i}}) \ge 0 \\ 0 & \textrm{otherwise} \end{cases} \]

Estimation of the precision for calibration curve observations (cc.prec):

\[ \tau_{cc,i} = \frac{1}{(\sigma_{a,w^*_i} + m_{h,w^*_i} * x_{cc,i})^2} \]

Likelihood for the calibration curve observations (cc.obs):

\[ y_{cc,i} \sim \textrm{N} \left( \mu = x_{cc,i},\tau = \tau_{cc,i} \right)\]

Clearance Model

In some cases, the intrinsic hepatic clearance assay will have two initial concentrations (i.e. lower and higher concentration), typically these are \(1\) and \(10 \mu M\) but do not necessarily need to be.

Prior for the binary indicator for whether the concentration decreases (decreases):

\[ d_c \sim \textrm{Bern} \left( p = p_{d_c} \right) \]

where \(p\) is the probability parameter and \(p_{d_c}\) is the user-specified prior probability that a chemical will decrease in the assay (DECREASE.PROB).

Prior for the binary indicator for whether the concentration degrades (degrades):

\[ d_g \sim \textrm{Bern} \left( p = p_{d_g} \right) \] where \(p\) is the probability parameter and \(p_{d_g}\) is the user-specified prior probability of instability for a chemical within the assay (i.e. abiotic degradation) (DEGRADE.PROB).

Prior for the slope representing the biological clearance rate (bio.rate) for the lower concentration, indicated by \(C_{thresh,1}\) and \(conc_{target,1}\):

\[ \delta_{bio} \sim \textrm{Unif} \left( a = 0,b = -log(\frac{C_{thresh,1}}{conc_{target,1}}) \right)\]

where \(a\) and \(b\) are the minimum and maximum, respectively.

Prior for the slope representing the abiotic clearance rate, when data from the inactivated hepatocytes is available (abio.rate) at the lower concentration, indicated by \(C_{thresh,1}\) and \(conc_{target,1}\):

\[ \delta_{abio} \sim \textrm{Unif} \left( a = 0,b = -log(\frac{C_{thresh,1}}{conc_{target,1}}) \right) \]

where \(a\) and \(b\) are the minimum and maximum, respectively.

Prior for the binary indicator for whether there is a reduction in the biological clearance and the chemical shows some metabolic saturation between the lower and higher concentration (saturates):

\[ s_f \sim \textrm{Bern} \left( p = p_{s_f} \right)\]

where \(p\) is the probability parameter and \(p_{s_f}\) is the user-specified prior probability that a chemical’s rate of metabolism will decrease between the lower and higher concentration (SATURATE.PROB).

\(p_{s_f} = 0.25\) assumes \(25\%\) of the chemical shows some metabolic saturation between the lower and higher concentrations.

Prior for the saturation at the higher concentration (i.e. magnitude of decreased biological clearance) at the higher concentration (saturation):

\[ s_a \sim \textrm{Unif} \left( a = 0,b = 1 \right) \]

Estimation of the actual biological clearance rate (bio.slope) at the lower (\(m_{bio,1}\)) and higher (\(m_{bio,2}\)) initial concentrations, respectively:

\[ m_{bio,1} = d_c * \delta_{bio} \] \[ m_{bio,2} = d_c * \delta_{bio}* (1-s_f*s_a) \]

such that \(m_{bio} = (m_{bio,1},m_{bio,2})\).

The total elimination rate (slope) at the lower (\(m_1\)) and higher (\(m_2\)) initial concentrations, respectively:

\[ m_1 = d_c * \delta_{bio} + d_g * \delta_{abio} \]

\[ m_2 = d_g * \delta_{abio} + d_c * \delta_{bio} *(1-s_f*s_a) \]

such that \(m = (m_1,m_2)\).

Estimate of the abiotic clearance rate (abio.slope):

\[ m_{abio} = d_g * \delta_{abio} \]

Estimation of Metabolic Clearance

Suppose \(n\) is the total number of observations (Num.obs), and \(y_{i}\) indicates the \(i^{th}\) observation, sample type = Cvst (obs). For each observation we obtain a posterior MCMC estimate for the calibration curve observations with the following:

Estimate of the metabolic exponential decay for the \(i^{th}\) observation (C):

\[ C_i = conc_{target,c^*_i}*exp(-m_{c^*_i}*t_i) \]

where

  • \(c^*_i\) represents the initial concentration index (obs.conc) for the \(i^{th}\) observation (i.e, \(c^* = (c^*_1,\ldots,c^*_n)\)) and \(c^*_i \in (1,2)\) such that \(1\) indicates the lower initial concentration and \(2\) indicates the higher initial concentration
  • \(conc_{target,c^*_i}\) is the expected initial concentration (Test.Nominal.Conc) corresponding to the \(i^{th}\) observation, given the concentration index \(c^*_i\)
  • \(m_{c^*_i}\) is the total elimination rate slope for the initial concentration corresponding to the \(i^{th}\) observation and \(m_{c^*_i} \in m\)
  • \(t_i\) is the time (obs.time) for the \(i^{th}\) observation

Estimate of the metabolic clearance predictions (obs.pred):

\[ x_i = cal_{w^*_i} * (\frac{C_i}{df_i} - \frac{C_{thresh,w^*_i}}{df_i}) * \beta_i + \frac{\gamma_{w^*_i}}{df_i} \]

where

  • \(df_i\) is the dilution factor (obs.Dilution.Factor) for the \(i^{th}\) observation
  • \(\beta_i\) is the JAGS step function for the \(i^{th}\) observation, such that

\[ \beta_i = \begin{cases} 1 & \textrm{if} \space (\frac{C_i}{df_i} - \frac{C_{thresh,w^*_i}}{df_i}) \ge 0 \\ 0 & \textrm{otherwise} \end{cases} \]

  • \(w^*_i\) represents the calibration index (obs.cal) for the \(i^{th}\) observation \(w^* = (w^*_1,\ldots,w^*_{n})\) and \(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations)

Estimates of the metabolic clearance precision (obs.prec):

\[ \tau_i = \frac{1}{(\sigma_{a,w^*_i} + m_{h,w^*_i}*x_i)^2} \]

Likelihood for metabolic clearance (obs):

\[ y_i \sim \textrm{N} \left( \mu = x_i,\tau = \tau_i \right)\]

Estimation of Abiotic Clearance

Suppose \(n_{abio}\) is the total number of abiotic observations (Num.abio.obs), and \(y_{abio,i}\) indicates the \(i^{th}\) observation, sample type = Inactive (abio.obs). For each observation we obtain a posterior MCMC estimate for the calibration curve observations with the following:

Estimate of the abiotic exponential decay for the \(i^{th}\) observation (abio.C):

\[ C_{abio,i} = conc_{target,w^*_i} * exp(-m_{abio}*t_{abio,i}) \]

  • \(w^*_i\) represents the calibration index (abio.obs.cal) for the \(i^{th}\) non-biological observation \(w^* = (w^*_1,\ldots,w^*_{n_abio})\) and \(w^*_i \in w\) (i.e. \(w^*_i\) indicates one of the \(n_{cal}\) calibrations)
  • \(conc_{target,w^*_i}\) is the expected initial concentration (Test.Nominal.Conc) corresponding to the \(i^{th}\) observation, given the calibration index \(w^*_i\)
  • \(t_{abio,i}\) is the time for the \(i^{th}\) abiotic observation (abio.obs.time)

Estimates for the abiotic clearance predictions (abio.obs.pred):

\[ x_{abio,i} = cal_{w^*_i} * (\frac{C_{abio,i}}{df_{abio,i}} - \frac{C_{thresh,w^*_i}}{df_{abio,i}}) * \beta_{abio,i} + \frac{\gamma_{w^*_i}}{df_{abio,i}} \]

where

  • \(df_{abio,i}\) is the dilution factor (abio.obs.Dilution.Factor) for the \(i^{th}\) abiotic observation
  • \(\beta_{abio,i}\) is the JAGS step function for the \(i^{th}\) abiotic observation, such that

\[ \beta_{abio,i} = \begin{cases} 1 & \textrm{if} \space (\frac{C_{abio,i}}{df_{abio,i}} - \frac{C_{thresh,w^*_i}}{df_{abio,i}}) \ge 0 \\ 0 & \textrm{otherwise} \end{cases} \]

Estimates for the abiotic clearance precision (abio.obs.prec):

\[ \tau_{abio,i} = \frac{1}{(\sigma_{a,w^*_i} + m_{h,w^*_i} * x_{abio,i})^2} \]

Likelihood for the abiotic clearance observations (abio.obs):

\[ y_{abio,i} \sim \textrm{N} \left( \mu = x_{abio,i},\tau = \tau_{abio,i} \right) \]

Post-MCMC Estimates

Conversion of disappearance rates (1/hr) to hepatic clearance (\(\mu L/min/10^6\)) for \(1 \mu M\):

\[ \mathbf{Clint.1} = 1000*\frac{m_{bio,1 \mu M}}{\rho_{hep}*60} \]

where \(\rho_{hep}\) is the hepatic density and \(m_{bio,1 \mu M}\) is the slope for “bio” at \(1 \mu M\) (i.e. \(m_{bio,1} = m_{bio,1 \mu M}\) where \(m_{bio,1}\) is the notation in earlier equations in the Clearance Model section).

Conversion of disappearance rates (1/hr) to hepatic clearance (\(\mu L/min/10^6\)) for \(10 \mu M\):

\[ \mathbf{Clint.10} = 1000*\frac{m_{bio,10 \mu M}}{\rho_{hep}*60} \]

where \(\rho_{hep}\) is the hepatic density and \(m_{bio,10 \mu M}\) is the slope for “bio” at \(10 \mu M\) (i.e. \(m_{bio,2} = m_{bio,10 \mu M}\) where \(m_{bio,2}\) is the notation in earlier equations in the Clearance Model section).

Posterior probability for intrinsic hepatic clearance:

\[ \mathbf{Clint.pValue} = \frac{\sum_{k=1}^{n_{M}} d_{c,k} }{n_{M}}\]

where \(n_{M}\) is the number of MCMC iterations and \(d_{c,k}\) is the binary result for MCMC iteration \(k\) indicating whether the chemical decreases (i.e. is biologically cleared).

Posterior probability for the chemical saturation:

\[ \mathbf{Sat.pValue} = \frac{\sum_{k=1}^{n_{M}}s_{f,k}}{n_{M}} \]

where \(n_{M}\) is the number of MCMC iterations and \(s_{f,k}\) is the binary result for MCMC iteration \(k\) indicating whether the saturation is hit in each of the MCMC iterations.

Posterior probability of the chemical degradation:

\[ \mathbf{degrades.pValue} = \frac{\sum_{k=1}^{n_{M}}d_{g,k}}{n_{M}}\]

where \(n_{M}\) is the number of MCMC iterations and \(d_{g,k}\) is the binary result for MCMC iteration \(k\) indicating whether the chemical degradation is hit (i.e. is non-biologically cleared).

JAGS to L4 Equation Notation Tables

Data passed to JAGS as part of the PREJAGS object:

Data Notation Description
Test.Nominal.Conc \(conc_{target}\) expected initial concentration
Num.cal \(n_{cal}\) total number of calibrations
Num.obs \(n\) total number of Cvst samples
obs \(y\) Cvst sample responses
obs.conc \(C\) Test.Nominal.Conc indices for Cvst samples
obs.time \(t\) time the Cvst samples were tested for a given chemical concentration (in hours)
obs.cal \(w^*\) calibration index for Cvst samples (reused index)
obs.Dilution.Factor \(df\) dilution factors for Cvst samples
Num.blank.obs \(n_{blank}\) total number of Blank samples
Blank.obs \(y_{blank}\) Blank sample responses
Blank.cal \(w^*\) calibration index for Blank samples (reused index)
Blank.Dilution.Factor \(df_{blank}\) dilution factors for Blank samples
Num.cc \(n_{cc}\) total number of CC samples
cc.obs.conc \(C_{cc}\) Test.Compound.Conc for CC samples of the tested compounds
cc.obs \(y_{cc}\) CC sample responses
cc.obs.cal \(w^*\) calibration index for CC samples (reused index)
cc.obs.Dilution.Factor \(df_{cc}\) dilution factors for CC samples
Num.abio.obs \(n_{abio}\) total number of Inactive samples
abio.obs \(y_{abio}\) Inactive sample responses
abio.obs.conc \(C_{abio}\) Test.Nominal.Conc indices for Inactive samples
abio.obs.time \(t_{abio}\) time the Inactive samples were tested for a given chemical concentration (in hours)
abio.obs.cal \(w^*\) calibration index for Inactive samples (reused index)
abio.obs.Dilution.Factor \(df_{abio}\) dilution factors for Inactive samples

User-specified prior parameter values via level-4 function arguments and passed to JAGS as part of the PREJAGS object:

Prior Notation level-4 Argument (Default) Description
DECREASE.PROB \(p_{d_c}\) decrease.prob (0.5) probability that a chemical will decrease in the assay
DEGRADE.PROB \(p_{d_g}\) degrade.prob (0.05) probability of instability for a chemical within the assay (i.e. abiotic degradation)
SATURATE.PROB \(p_{s_f}\) saturate.prob (0.25) probability that a chemical’s rate of metabolism will decrease between the lower and higher concentration

MCMC Parameters in JAGS:

JAGS Parameter Name Parameter Distribution Prior/Posterior/Calculated
log.const.analytic.sd \(log(\sigma_a)\) Uniform Prior
log.hetero.analytic.slope \(log(m_h)\) Uniform Prior
C.thresh \(C_{thresh}\) Uniform Prior
log.calibration \(log(cal)\) Normal Prior
background \(\gamma\) Exponential Prior
const.analytic.sd \(\sigma_a\) Calculated
hetero.analytic.slope \(m_h\) Calculated
calibration \(cal\) Calculated
Blank.pred \(x_{blank}\) Calculated
Blank.prec \(\tau_{blank}\) Calculated
Blank.obs \(y_{blank}\) Normal Posterior
cc.slope \(m_{cc}\) Calculated
cc.intercept \(\alpha_{cc}\) Calculated
cc.pred \(x_{cc}\) Calculated
cc.prec \(\tau_{cc}\) Calculated
cc.obs \(y_{cc}\) Normal Posterior
decreases \(d_c\) Bernoulli Prior
degrades \(d_g\) Bernoulli Prior
bio.rate \(\delta_{bio}\) Uniform Prior
abio.rate \(\delta_{abio}\) Uniform Prior
slope \(m\) Calculated
bio.slope \(m_{bio}\) Calculated
saturates \(s_f\) Bernoulli Prior
saturation \(s_a\) Uniform Prior
C \(C\) Calculated
obs.pred \(x\) Calculated
obs.prec \(\tau\) Calculated (Not the general distribution parameter.)
obs \(y\) Normal Posterior
abio.slope \(m_{abio}\) Calculated
abio.C \(C_{abio}\) Calculated
abio.obs.pred \(x_{abio}\) Calculated
abio.obs.prec \(\tau_{abio}\) Calculated
abio.obs \(y_{abio}\) Normal Posterior

References

Breen, Miyuki, Caroline L Ring, Anna Kreutz, Michael-Rock Goldsmith, and John F Wambaugh. 2021. “High-Throughput PBTK Models for in Vitro to in Vivo Extrapolation.” Expert Opinion on Drug Metabolism & Toxicology 17 (8): 903–21.
Shibata, Yoshihiro, Hiroyuki Takahashi, Masato Chiba, and Yasuyuki Ishii. 2002. “Prediction of Hepatic Clearance and Availability by Cryopreserved Human Hepatocytes: An Application of Serum Incubation Method.” Drug Metabolism and Disposition 30 (8): 892–96.