Apparent Membrane Permeability (Caco-2)

Load Data

First, we load in the example dataset from invitroTKstats.

# Load example caco2 data 
data("Caco2-example")

Five datasets are loaded in: caco2_cheminfo, caco2_L0, caco2_L1, caco2_L2, and caco2_L3. The first, caco2_cheminfo, contains chemical information necessary for identification mapping; it is used to create Level 0 data. The latter datasets contain Caco-2 data at Level 0, 1, 2, and 3 respectively. For the purpose of this vignette, we’ll start with caco2_L0, the Level 0 data, to demonstrate the complete pipelining process.

caco2_L0 is the output from the merge_level0 function which compiles raw lab data from specified Excel files into a singular data frame. The data frame contains exactly one row per sample with information obtained from the mass spectrometer. For more details on curating raw lab data to a singular Level 0 data frame, see the “Data Guide Creation and Level-0 Data Compilation” vignette.

The following table displays the first three rows of caco2_L0, our Level 0 data.

Table 1: Level 0 data
Compound	DTXSID	Lab.Compound.ID	Date	Sample	Type	Compound.Conc	ISTD.Name	Analysis.Params	Level0.File	Level0.Sheet	Direction	Vol.Receiver	Vol.Donor	Dilution.Factor
Thiobencarb	DTXSID6024337	BF00175258	100317	Blank 1_TB_A_B_1_____QP1F8_Inj 5	Blank	10	ISTD Name	GC	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	AtoB	0.250	0.075	1
Thiobencarb	DTXSID6024337	BF00175258	100317	Blank 1_TB_A_B_2_____QP1F8_Inj 6	Blank	10	ISTD Name	GC	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	AtoB	0.250	0.075	1
Thiobencarb	DTXSID6024337	BF00175258	100317	Blank 1_TB_B_A_3_____QP1F8_Inj 3	Blank	10	ISTD Name	GC	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	BtoA	0.075	0.250	1

Level 1 processing

format_caco2 is the Level 1 function used to create a standardized data frame. This level of processing is necessary because naming conventions or formatting can differ across data sets.

If the Level 0 data already contains the required column, then the existing column name can be specified. For example, caco2_L0 already contains a column specifying the sample name called “Sample”. However, the default column name for sample name is “Lab.Sample.Name”. Therefore, we specify the correct column with sample.col = "Sample". In general, to specify an already existing column that differs from the default, the user must use the parameter with the .col suffix.

If the Level 0 data does not already contain the required column, then the entire column can be populated with a single value. For example, caco2_L0 does not contain a column specifying biological replicates. Therefore, we populate the required column with biological.replicates = 1. In general, to specify a single value for an entire column, the user must use the parameter without the .col suffix.

Users should be mindful if they choose to specify a single value for all of their samples; they should verify this action is one they wish to take.

Some columns must be present in the Level 0 data while others can be filled with a single value. At minimum, the following columns must be present in the Level 0 data and specification with a single entry is not permitted: sample.col, lab.compound.col, dtxsid.col, compound.col, area.col, istd.col, type.col, direction.col, receiver.vol.col, and donor.vol.col.

If there is no additional note.col in the Level 0 data, users should use note.col = NULL to fill the column with “Note”.

The rest of the following columns may either be specified from the Level 0 data or filled with a single value: date.col or date, membrane.area.col or membrane.area, test.conc.col or test.conc, cal.col or cal, dilution.col or dilution, time.col or time, istd.name.col or istd.name, istd.conc.col or istd.conc, test.nominal.conc.col or test.nominal.conc, biological.replicates.col or biological.replicates, technical.replicates.col or technical.replicates, analysis.method.col or analysis.method, analysis.instrument.col or analysis.instrument, analysis.parameters.col or analysis.parameters, level0.file.col or level0.file, and level0.sheet.col or level0.sheet.

Table 2: Level 1 'format_caco2' function arguments
Argument	Default	Required in L0?	Corresp. single-entry Argument	Descr.
FILENAME	MYDATA	N/A		Output and input filename
data.in		N/A		Level 0 data frame
sample.col	Lab.Sample.Name	Y		Lab sample name
lab.compound.col	Lab.Compound.Name	Y		Lab test compound name (abbr.)
dtxsid.col	DTXSID	Y		EPA's DSSTox Structure ID
date.col	Date	N	date	Lab measurement date
compound.col	Compound.Name	Y		Formal test compound name
area.col	Area	Y		Target analyte peak area
istd.col	ISTD.Area	Y		Internal standard peak area
type.col	Type	Y		Sample type (Blank/D0/D2/R2)
direction.col	Direction	Y		Experiment direction
membrane.area.col	Membrane.Area	N	membrane.area	Membrane area
receiver.vol.col	Vol.Receiver	Y		Receiver compartment volume
donor.vol.col	Vol.Donor	Y		Donor compartment volume
test.conc.col	Test.Compound.Conc	N	test.conc	Standard test chemical concentration
cal.col	Cal	N	cal	MS calibration
dilution.col	Dilution.Factor	N	dilution	Number of times sample was diluted
time.col	Time	N	time	Time before compartments were measured
istd.name.col	ISTD.Name	N	istd.name	Internal standard name
istd.conc.col	ISTD.Conc	N	istd.conc	Internal standard concentration
test.nominal.conc.col	Test.Target.Conc	N	test.nominal.conc	Expected initial chemical concentration added to donor
biological.replicates.col	Biological.Replicates	N	biological.replicates	Replicates with the same analyte
technical.replicates.col	Technical.Replicates	N	technical.replicates	Repeated measurements from one sample
analysis.method.col	Analysis.Method	N	analysis.method	Analytical chemistry analysis method
analysis.instrument.col	Analysis.Instrument	N	analysis.instrument	Analytical chemistry analysis instrument
analysis.parameters.col	Analysis.Parameters	N	analysis.parameters	Analytical chemistry analysis parameters
note.col	Note	N		Additional notes
level0.file.col	Level0.File	N	level0.file	Raw data filename
level0.sheet.col	Level0.Sheet	N	level0.sheet	Raw data sheet name
output.res	FALSE	N/A		Export results (TSV)?
save.bad.types	FALSE	N/A		Export bad data (TSV)?
sig.figs	5	N/A		Number of significant figures
INPUT.DIR	NULL	N/A		Input directory of Level 0 file
OUTPUT.DIR	NULL	N/A		Export directory to save Level 1 files

A TSV file containing the level-1 data can be exported to the user’s per-session temporary directory. This temporary directory is a per-session directory whose path can be found with the following code: tempdir(). For more details, see [https://www.collinberke.com/til/posts/2023-10-24-temp-directories/].

To avoid exporting to this temporary directory, an OUTPUT.DIR must be specified. We have omitted this export entirely with output.res = FALSE (the default). The option to omit exporting a TSV file is also available at levels 2 and 3 and will be used from this point forward.

caco2_L1_curated <- format_caco2(FILENAME = "Caco2_vignette",
                                 data.in = caco2_L0,
                                 # columns present in L0 data 
                                 sample.col = "Sample",
                                 lab.compound.col = "Lab.Compound.ID",
                                 compound.col = "Compound",
                                 area.col = "Peak.Area",
                                 istd.col = "ISTD.Peak.Area",
                                 test.conc.col = "Compound.Conc",
                                 analysis.method.col = "Analysis.Params",
                                 # columns not present in L0 data
                                 biological.replicates = 1,
                                 technical.replicates = 1,
                                 membrane.area = 0.11,
                                 cal = 1,
                                 time = 2, 
                                 istd.conc = 1,
                                 test.nominal.conc = 10,
                                 analysis.instrument = "Agilent.GCMS",
                                 analysis.parameters = "Unknown",
                                 note.col = NULL,
                                 # don't export output TSV file
                                 output.res = FALSE
                                 )
#> Responses of samples with a "Blank" sample type and a NA response have been reassigned to 0.
#> 48 observations of 3 chemicals based on 3 separate measurements (calibrations).

We receive a message that “Responses of samples with a”Blank” sample type and a NA response have been reassigned to 0.” Currently, when the ISTD.Area measurement for a sample is 0, the calculated Response is NaN, or not a number, because of the division by 0. In this particular case, all of our samples that have a NaN Response are “Blank” sample types. But, these “Blank” sample types are needed for point estimation. (Note: NA responses of other sample types are appropriately removed.) Users can verify this with the following check.

all(caco2_L1_curated[caco2_L1_curated$ISTD.Area == 0, "Sample.Type"] == "Blank")
#> [1] TRUE

Therefore, the NaNs are replaced with 0 so that the “Blank” sample types are kept in the dataset and our estimates are accurate. Users can verify that all “Blank” samples with ISTD.Area = 0 have their Response values reassigned to 0 with the following check.

# Verify Blank samples with ISTD.Area = 0 also have Response = 0
resp <- caco2_L1_curated %>% 
  dplyr::filter(Sample.Type == "Blank" & ISTD.Area == 0) %>% 
  dplyr::select(Response) %>% 
  unlist()

all(resp == 0)
#> [1] TRUE

Now, all of our samples are successfully formatted and returned in caco2_L1_curated, our Level 1 data produced from format_caco2. Each sample has one of the following sample types indicated in bold.

Blank with no chemical added (Blank)
Target concentration added to donor compartment at time 0 (can be referred to as C0) (D0)
Donor compartment at the end of the experiment (D2)
Receiver compartment at the end of the experiment (R2)

If any samples had a different sample type, they would have been removed and reported to the user. If the user wants to export the removed samples as a TSV, the user should set the parameter save.bad.types = TRUE.

The following table displays the first three samples of caco2_L1_curated with a non-Blank sample type. In addition to the columns specified by the user, there is an additional column called Response. This column is the test compound concentration and is calculated as \(\textrm{Response} = \frac{\textrm{Analyte Area}}{\textrm{ISTD Area}}*\textrm{ISTD Conc}\) where \(\textrm{Analyte Area}\) is defined by the Area column, \(\textrm{ISTD Area}\) is defined by the ISTD.Area column, and \(\textrm{ISTD Conc}\) is defined by the ISTD.Conc column.

Table 3: Level 1 data
Lab.Sample.Name	Date	Compound.Name	DTXSID	Lab.Compound.Name	Sample.Type	Dilution.Factor	Calibration	ISTD.Name	ISTD.Conc	ISTD.Area	Area	Analysis.Method	Analysis.Instrument	Analysis.Parameters	Level0.File	Level0.Sheet	Time	Direction	Test.Compound.Conc	Test.Nominal.Conc	Membrane.Area	Vol.Receiver	Vol.Donor	Biological.Replicates	Technical.Replicates	Response
BF175258_A_B_rec_1_____GP3C1_Inj 7	100317	Thiobencarb	DTXSID6024337	BF00175258	R2	4	1	ISTD Name	1	265,443	6,117	GC	Agilent.GCMS	Unknown	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	2	AtoB	10	10	0.11	0.25	0.075	1	1	0.02304450
BF175258_A_B_rec_2_____GP3D1_Inj 8	100317	Thiobencarb	DTXSID6024337	BF00175258	R2	4	1	ISTD Name	1	255,049	7,282	GC	Agilent.GCMS	Unknown	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	2	AtoB	10	10	0.11	0.25	0.075	1	1	0.02855138
BF175258_A_B_dos_1_____GP5C1_Inj 63	100317	Thiobencarb	DTXSID6024337	BF00175258	D0	4	1	ISTD Name	1	280,232	318,169	GC	Agilent.GCMS	Unknown	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	2	AtoB	10	10	0.11	0.25	0.075	1	1	1.13537712

Level 2 processing

sample_verification is the Level 2 function used to add a verification column. The verification column indicates whether a sample should be included in the point estimation (Level 3) processing. This column allows users to keep all samples in their data but only utilize the reliable samples for P_app estimation. All of the data in Level 2 is identical to the data in Level 1 with the exception of the additional Verified column.

To determine whether a sample should be included, the user should consult the wet-lab scientists from where their data originates or a chemist who may be able to provide reliable rationale for samples that should not be verified. This level of processing allows the user to receive feedback, exclude erroneous or unreliable samples, and produce new P_app estimates. Thus, there is always an open channel of communication between the user and the wet-lab scientists or chemists.

We will use the already processed Level 2 data frame, caco2_L2, to regenerate our exclusion data. Note, all of our samples are verified but we are explaining how to create an exclusion list for learning purposes. In general, the user would not have access to the exclusion information a priori.

The exclusion data frame must include the following columns: Variables, Values, and Message. The Variables column contains the variable names used to filter the excluded rows. Here, we are using Lab.Sample.Name and DTXSID to identify the excluded rows separated by a “|”. The Values column contains the values of the variables, as a character, also separated by a “|”. The Message column contains the reason for exclusion. Here, we are using the reasons listed in the Verified column in caco2_L2. The user should refrain from using “|” in any of their descriptions to avoid conflicts with the sample_verification function.

# Use verification data from loaded in `caco2_L2` data frame 
exclusion <- caco2_L2 %>% 
  filter(Verified != "Y") %>% 
  mutate("Variables" = "Lab.Sample.Name|DTXSID") %>% 
  mutate("Values" = paste(Lab.Sample.Name, DTXSID, sep = "|")) %>% 
  mutate("Message" = Verified) %>% 
  select(Variables, Values, Message)

Table 4: Exclusion data frame
Variables	Values	Message

As expected, our exclusion data frame is empty because all of our samples are verified. If all of the user’s samples are verified, they simply do not provide an exclusion.info data frame in sample_verification.

caco2_L2_curated <- sample_verification(FILENAME = "Caco2_vignette",
                                        data.in = caco2_L1_curated,
                                        assay = "Caco-2",
                                        # don't export output TSV file
                                        output.res = FALSE)

Our Level 2 data now contains a Verified column. If the sample should be included, the column contains a “Y” for yes. If the sample should be excluded, the column contains the reason for exclusion.

The following table displays some rows of the Level 2 data.

Table 5: Level 2 data
Lab.Sample.Name	Date	Compound.Name	DTXSID	Lab.Compound.Name	Sample.Type	Dilution.Factor	Calibration	ISTD.Name	ISTD.Conc	Analysis.Method	Analysis.Instrument	Analysis.Parameters	Level0.File	Level0.Sheet	Time	Direction	Test.Compound.Conc	Test.Nominal.Conc	Membrane.Area	Vol.Receiver	Vol.Donor	Biological.Replicates	Technical.Replicates	Verified
Blank 1_TB_A_B_1_____QP1F8_Inj 5	100317	Thiobencarb	DTXSID6024337	BF00175258	Blank	1	1	ISTD Name	1	GC	Agilent.GCMS	Unknown	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	2	AtoB	10	10	0.11	0.250	0.075	1	1	Y
Blank 1_TB_A_B_2_____QP1F8_Inj 6	100317	Thiobencarb	DTXSID6024337	BF00175258	Blank	1	1	ISTD Name	1	GC	Agilent.GCMS	Unknown	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	2	AtoB	10	10	0.11	0.250	0.075	1	1	Y
Blank 1_TB_B_A_3_____QP1F8_Inj 3	100317	Thiobencarb	DTXSID6024337	BF00175258	Blank	1	1	ISTD Name	1	GC	Agilent.GCMS	Unknown	Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm	Raw Data	2	BtoA	10	10	0.11	0.075	0.250	1	1	Y

Level 3 processing

calc_caco2_point is the Level 3 function used to calculate the P_app point estimate for each test compound using a Frequentist framework.

Mathematically, P_app is the amount of compound transported per unit time and is defined as \[P_{app} = \frac{dQ/dt}{c_0*A}\] where \(dQ/dt\) is the rate of permeation, \(c_0\) is the initial concentration in the donor compartment, and \(A\) is the surface area of the cell monolayer.

First, we define the rate of permeation as the amount of compound passing through the monolayer per unit time. It is expressed as \[\frac{dQ}{dt} = \frac{c_{\textrm{receiver}}V_{\textrm{receiver}}}{\Delta t}\] where \(c_{\textrm{receiver}}\) is the concentration of the test compound in the receiver compartment, \(V_{\textrm{receiver}}\) is the volume of the receiver compartment, and \(\Delta t\) is the total elapsed time. This rate is a type of flux and therefore has units of \(\mu \textrm{mol}/s\). The dimensional analysis is described below. \[\left[\frac{dQ}{dt}\right] = \frac{[c_{\textrm{receiver}}][V_{\textrm{receiver}}]}{[\Delta t]} = \frac{[\mu \textrm{mol/L}][\textrm{L]}}{[s]} = \frac{[\mu \textrm{mol}]}{[s]} \]

Next, to estimate \(c_\textrm{receiver}\), we first multiply each R2 response with the corresponding R2 dilution factor and estimate the mean. To account for background noise, we follow a similar procedure with the blank response and dilution factor. We then subtract the two such that \[c_\textrm{receiver} = \textrm{mean}(\textrm{dilution factor}_{R2} * \textrm{Response}_{R2}) - \textrm{mean}(\textrm{dilution factor}_{Blank} * \textrm{Response}_{Blank})\]

The initial concentration in the donor compartment, \(c_0\), is calculated similarly but with the D0 response instead of the R2 response.

The last remaining arguments are estimated with columns in our data: \(V_\textrm{receiver}\) is defined by our Vol.Receiver column, \(\Delta t\) is defined by our Time column, and \(A\) is defined by our Membrane.Area column.

With this information, we calculate P_app in the donor to receiver direction. The estimate’s direction is annotated according to Figure 1. For example, results annotated with the “A2B” suffix denote the apical side as the donor and the basolateral side as the receiver. Contrastingly, results annotated with the “B2A” suffix denote the basolateral side as the donor and the apical side as the receiver.

caco2_L3_curated <- calc_caco2_point(FILENAME = "Caco2_vignette",
                                     data.in = caco2_L2_curated, 
                                     # don't export output TSV file 
                                     output.res = FALSE
                                     )
#> [1] "Thiobencarb Refflux = 0.683"
#> [1] "Nitrapyrin Refflux = 1.08"
#> [1] "4-Chloro-2-methylaniline Refflux = 0.777"
#> [1] "Apical to basolateral permeability calculated for 3 chemicals."
#> [1] "Basolateral to apical permeability calculated for 3 chemicals."
#> [1] "Efflux ratio calculated for 3 chemicals."

Table 6: Level 3 data
Compound.Name	DTXSID	Time	Membrane.Area	Calibration	C0_A2B	dQdt_A2B	Papp_A2B	Frec_A2B.vec	Frec_A2B.mean	Recovery_Class_A2B.vec	Recovery_Class_A2B.mean	C0_B2A	dQdt_B2A	Papp_B2A	Frec_B2A.vec	Frec_B2A.mean	Recovery_Class_B2A.vec	Recovery_Class_B2A.mean	Refflux
Thiobencarb	DTXSID6024337	2	0.11	All Data	4.3791963	0.000003583047	7.438157	0.185121268169732\|0.247262378175506	0.2161918	Low Recovery\|Low Recovery	Low Recovery	9.7077470	0.000005428904	5.083947	0.170999686957327\|0.223500162571215	0.1972499	Low Recovery\|Low Recovery	Low Recovery	0.6834955
Nitrapyrin	DTXSID0024216	2	0.11	All Data	0.7310252	0.000001094650	13.612891	0.278794672635322\|0.269665209932159	0.2742299	Low Recovery\|Low Recovery	Low Recovery	0.7179703	0.000001162871	14.724225	0.322876826680762\|0.291837993091206	0.3073574	Low Recovery\|Low Recovery	Low Recovery	1.0816384
4-Chloro-2-methylaniline	DTXSID1041508	2	0.11	All Data	6.8388783	0.000021289562	28.300177	0.608500272639059\|0.581745561840612	0.5951229			6.8960900	0.000016679138	21.987608	0.57300966864841\|0.588685286412322	0.5808475			0.7769424

In addition to returning an estimate for Papp and its components (C0, dQdt) in both directions, our Level 3 data contains estimates for the efflux ratio (Refflux), the fraction recovered (Frec) in both directions, and the qualitative category of each fraction recovered value (Recovery_Class).

The efflux ratio is the ratio between the apparent membrane permeabilities and is expressed as \[R_{\textrm{efflux}} = \frac{P_{app}^{B2A}}{P_{app}^{A2B}}\]

The fraction recovered is the fraction of the initial donor amount recovered in the receiver compartment. It assesses the accuracy of the analytical chemistry method used to determine concentration values. It is expressed as \[ \frac{V_{D2}*\textrm{DF}_{D2}*(Responses_{D2} - \overline{Response}_{Blank}) + V_{R2}*\textrm{DF}_{R2}*(Responses_{R2} - \overline{Response}_{Blank})}{V_{D0}*\textrm{DF}_{D0}*(Responses_{D0} - \overline{Response}_{Blank})} \] where \(V\) is the corresponding volume, \(\textrm{DF}\) is the corresponding dilution factor, \(Responses\) are the corresponding responses, and \(\overline{Response}_{Blank}\) is the mean blank response.

Fraction recovered values are then classified and provided as the Recovery_Class value. Values \(<0.4\) are classified as “Low Recovery” and values \(>2\) are classified as “High Recovery”. Values within the range, \(0.4 \leq \textrm{Frec} \leq 2\), are considered normal and are not given an explicit classification.

If there are multiple samples per sample type per chemical, (i.e. biological replicates), a vector of fraction recovered values will be returned with a “|” separating each value. Similarly, a vector of recovery classifications will be returned for each value in the vector of fraction recovered values with a “|” separating each value.

The following columns are always returned involving the fraction recovered in the A to B direction:

Frec_A2B.vec - vector of fraction recovered values
Recovery_Class_A2B.vec - vector of recovery classifications
Frec_A2B.mean - mean of fraction recovered values
Recovery_Class_A2B.mean - recovery classification of mean fraction recovered value

Note that even if there are no biological replicates, the above four columns are still returned; all of the Frec estimates will just be equivalent. Estimates are also returned in the B to A direction.

Looking at our results, some of our Frec_A2B.vec and Frec_B2A.vec values are less than \(0.4\) and are therefore classified as “Low Recovery”. Others are within the \([0.4, 2]\) range and are therefore considered normal.

Apparent Membrane Permeability (Caco-2)

US EPA’s Center for Computational Toxicology and Exposure ccte@epa.gov

Introduction

Suggested packages for use with this vignette

Load Data

Level 1 processing

Level 2 processing

Level 3 processing

Best Practices: Food for thought

References