This vignette guides users on how to estimate apparent membrane permeability (Papp) from mass spectrometry data. Apparent membrane permeability is a chemical specific parameter that describes the general transport of the chemical through a membrane (Honda et al. (2025), Hubatsch, Ragnarsson, and Artursson (2007)).
The mass spectrometry data should be collected from a Caco-2 assay in which the test compound was added to either the apical or basolateral side of a confluent and differentiated Caco-2 cell monolayer as seen in Figure 1 (Honda et al. (2025), Hubatsch, Ragnarsson, and Artursson (2007)).
Fig 1: Caco-2 experimental set up
First, we load in the example dataset from
invitroTKstats
.
Five datasets are loaded in: caco2_cheminfo
,
caco2_L0
, caco2_L1
, caco2_L2
, and
caco2_L3
. The first, caco2_cheminfo
, contains
chemical information necessary for identification mapping; it is used to
create Level 0 data. The latter datasets contain Caco-2 data at Level 0,
1, 2, and 3 respectively. For the purpose of this vignette, we’ll start
with caco2_L0
, the Level 0 data, to demonstrate the
complete pipelining process.
caco2_L0
is the output from the
merge_level0
function which compiles raw lab data from
specified Excel files into a singular data frame. The data frame
contains exactly one row per sample with information obtained from the
mass spectrometer. For more details on curating raw lab data to a
singular Level 0 data frame, see the “Data Guide Creation and Level-0
Data Compilation” vignette.
The following table displays the first three rows of
caco2_L0
, our Level 0 data.
Compound | DTXSID | Lab.Compound.ID | Date | Sample | Type | Compound.Conc | Peak.Area | ISTD.Peak.Area | ISTD.Name | Analysis.Params | Level0.File | Level0.Sheet | Direction | Vol.Receiver | Vol.Donor | Dilution.Factor |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Thiobencarb | DTXSID6024337 | BF00175258 | 100317 | Blank 1_TB_A_B_1_____QP1F8_Inj 5 | Blank | 10 | 0 | 0 | ISTD Name | GC | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | AtoB | 0.250 | 0.075 | 1 |
Thiobencarb | DTXSID6024337 | BF00175258 | 100317 | Blank 1_TB_A_B_2_____QP1F8_Inj 6 | Blank | 10 | 0 | 0 | ISTD Name | GC | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | AtoB | 0.250 | 0.075 | 1 |
Thiobencarb | DTXSID6024337 | BF00175258 | 100317 | Blank 1_TB_B_A_3_____QP1F8_Inj 3 | Blank | 10 | 0 | 0 | ISTD Name | GC | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | BtoA | 0.075 | 0.250 | 1 |
format_caco2
is the Level 1 function used to create a
standardized data frame. This level of processing is necessary because
naming conventions or formatting can differ across data sets.
If the Level 0 data already contains the required column, then the
existing column name can be specified. For example,
caco2_L0
already contains a column specifying the sample
name called “Sample”. However, the default column name for sample name
is “Lab.Sample.Name”. Therefore, we specify the correct column with
sample.col = "Sample"
. In general, to specify an already
existing column that differs from the default, the user must use the
parameter with the .col
suffix.
If the Level 0 data does not already contain the required column,
then the entire column can be populated with a single value. For
example, caco2_L0
does not contain a column specifying
biological replicates. Therefore, we populate the required column with
biological.replicates = 1
. In general, to specify a single
value for an entire column, the user must use the parameter without the
.col
suffix.
Users should be mindful if they choose to specify a single value for all of their samples; they should verify this action is one they wish to take.
Some columns must be present in the Level 0 data while others can be
filled with a single value. At minimum, the following columns must be
present in the Level 0 data and specification with a single entry is not
permitted: sample.col
, lab.compound.col
,
dtxsid.col
, compound.col
,
area.col
, istd.col
, type.col
,
direction.col
, receiver.vol.col
, and
donor.vol.col
.
If there is no additional note.col
in the Level 0 data,
users should use note.col = NULL
to fill the column with
“Note”.
The rest of the following columns may either be specified from the
Level 0 data or filled with a single value: date.col
or
date
, membrane.area.col
or
membrane.area
, test.conc.col
or
test.conc
, cal.col
or cal
,
dilution.col
or dilution
,
time.col
or time
, istd.name.col
or istd.name
, istd.conc.col
or
istd.conc
, test.nominal.conc.col
or
test.nominal.conc
, biological.replicates.col
or biological.replicates
,
technical.replicates.col
or
technical.replicates
, analysis.method.col
or
analysis.method
, analysis.instrument.col
or
analysis.instrument
, analysis.parameters.col
or analysis.parameters
, level0.file.col
or
level0.file
, and level0.sheet.col
or
level0.sheet
.
Argument | Default | Required in L0? | Corresp. single-entry Argument | Descr. |
---|---|---|---|---|
FILENAME | MYDATA | N/A | Output and input filename | |
data.in | N/A | Level 0 data frame | ||
sample.col | Lab.Sample.Name | Y | Lab sample name | |
lab.compound.col | Lab.Compound.Name | Y | Lab test compound name (abbr.) | |
dtxsid.col | DTXSID | Y | EPA's DSSTox Structure ID | |
date.col | Date | N | date | Lab measurement date |
compound.col | Compound.Name | Y | Formal test compound name | |
area.col | Area | Y | Target analyte peak area | |
istd.col | ISTD.Area | Y | Internal standard peak area | |
type.col | Type | Y | Sample type (Blank/D0/D2/R2) | |
direction.col | Direction | Y | Experiment direction | |
membrane.area.col | Membrane.Area | N | membrane.area | Membrane area |
receiver.vol.col | Vol.Receiver | Y | Receiver compartment volume | |
donor.vol.col | Vol.Donor | Y | Donor compartment volume | |
test.conc.col | Test.Compound.Conc | N | test.conc | Standard test chemical concentration |
cal.col | Cal | N | cal | MS calibration |
dilution.col | Dilution.Factor | N | dilution | Number of times sample was diluted |
time.col | Time | N | time | Time before compartments were measured |
istd.name.col | ISTD.Name | N | istd.name | Internal standard name |
istd.conc.col | ISTD.Conc | N | istd.conc | Internal standard concentration |
test.nominal.conc.col | Test.Target.Conc | N | test.nominal.conc | Expected initial chemical concentration added to donor |
biological.replicates.col | Biological.Replicates | N | biological.replicates | Replicates with the same analyte |
technical.replicates.col | Technical.Replicates | N | technical.replicates | Repeated measurements from one sample |
analysis.method.col | Analysis.Method | N | analysis.method | Analytical chemistry analysis method |
analysis.instrument.col | Analysis.Instrument | N | analysis.instrument | Analytical chemistry analysis instrument |
analysis.parameters.col | Analysis.Parameters | N | analysis.parameters | Analytical chemistry analysis parameters |
note.col | Note | N | Additional notes | |
level0.file.col | Level0.File | N | level0.file | Raw data filename |
level0.sheet.col | Level0.Sheet | N | level0.sheet | Raw data sheet name |
output.res | FALSE | N/A | Export results (TSV)? | |
save.bad.types | FALSE | N/A | Export bad data (TSV)? | |
sig.figs | 5 | N/A | Number of significant figures | |
INPUT.DIR | NULL | N/A | Input directory of Level 0 file | |
OUTPUT.DIR | NULL | N/A | Export directory to save Level 1 files |
A TSV file containing the level-1 data can be exported to the user’s
per-session temporary directory. This temporary directory is a
per-session directory whose path can be found with the following code:
tempdir()
. For more details, see [https://www.collinberke.com/til/posts/2023-10-24-temp-directories/].
To avoid exporting to this temporary directory, an
OUTPUT.DIR
must be specified. We have omitted this export
entirely with output.res = FALSE
(the default). The option
to omit exporting a TSV file is also available at levels 2 and 3 and
will be used from this point forward.
caco2_L1_curated <- format_caco2(FILENAME = "Caco2_vignette",
data.in = caco2_L0,
# columns present in L0 data
sample.col = "Sample",
lab.compound.col = "Lab.Compound.ID",
compound.col = "Compound",
area.col = "Peak.Area",
istd.col = "ISTD.Peak.Area",
test.conc.col = "Compound.Conc",
analysis.method.col = "Analysis.Params",
# columns not present in L0 data
biological.replicates = 1,
technical.replicates = 1,
membrane.area = 0.11,
cal = 1,
time = 2,
istd.conc = 1,
test.nominal.conc = 10,
analysis.instrument = "Agilent.GCMS",
analysis.parameters = "Unknown",
note.col = NULL,
# don't export output TSV file
output.res = FALSE
)
#> Responses of samples with a "Blank" sample type and a NA response have been reassigned to 0.
#> 48 observations of 3 chemicals based on 3 separate measurements (calibrations).
We receive a message that “Responses of samples with a”Blank” sample
type and a NA response have been reassigned to 0.” Currently, when the
ISTD.Area
measurement for a sample is 0, the calculated
Response
is NaN, or not a number, because of the division
by 0. In this particular case, all of our samples that have a NaN
Response
are “Blank” sample types. But, these “Blank”
sample types are needed for point estimation. (Note: NA responses of
other sample types are appropriately removed.) Users can verify this
with the following check.
Therefore, the NaNs are replaced with 0 so that the “Blank” sample
types are kept in the dataset and our estimates are accurate. Users can
verify that all “Blank” samples with ISTD.Area
= 0 have
their Response
values reassigned to 0 with the following
check.
# Verify Blank samples with ISTD.Area = 0 also have Response = 0
resp <- caco2_L1_curated %>%
dplyr::filter(Sample.Type == "Blank" & ISTD.Area == 0) %>%
dplyr::select(Response) %>%
unlist()
all(resp == 0)
#> [1] TRUE
Now, all of our samples are successfully formatted and returned in
caco2_L1_curated
, our Level 1 data produced from
format_caco2
. Each sample has one of the following sample
types indicated in bold.
If any samples had a different sample type, they would have been
removed and reported to the user. If the user wants to export the
removed samples as a TSV, the user should set the parameter
save.bad.types = TRUE
.
The following table displays the first three samples of
caco2_L1_curated
with a non-Blank sample type. In addition
to the columns specified by the user, there is an additional column
called Response
. This column is the test compound
concentration and is calculated as \(\textrm{Response} = \frac{\textrm{Analyte
Area}}{\textrm{ISTD Area}}*\textrm{ISTD Conc}\) where \(\textrm{Analyte Area}\) is defined by the
Area
column, \(\textrm{ISTD
Area}\) is defined by the ISTD.Area
column, and
\(\textrm{ISTD Conc}\) is defined by
the ISTD.Conc
column.
Lab.Sample.Name | Date | Compound.Name | DTXSID | Lab.Compound.Name | Sample.Type | Dilution.Factor | Calibration | ISTD.Name | ISTD.Conc | ISTD.Area | Area | Analysis.Method | Analysis.Instrument | Analysis.Parameters | Note | Level0.File | Level0.Sheet | Time | Direction | Test.Compound.Conc | Test.Nominal.Conc | Membrane.Area | Vol.Receiver | Vol.Donor | Biological.Replicates | Technical.Replicates | Response |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
BF175258_A_B_rec_1_____GP3C1_Inj 7 | 100317 | Thiobencarb | DTXSID6024337 | BF00175258 | R2 | 4 | 1 | ISTD Name | 1 | 265,443 | 6,117 | GC | Agilent.GCMS | Unknown | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | 2 | AtoB | 10 | 10 | 0.11 | 0.25 | 0.075 | 1 | 1 | 0.02304450 | |
BF175258_A_B_rec_2_____GP3D1_Inj 8 | 100317 | Thiobencarb | DTXSID6024337 | BF00175258 | R2 | 4 | 1 | ISTD Name | 1 | 255,049 | 7,282 | GC | Agilent.GCMS | Unknown | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | 2 | AtoB | 10 | 10 | 0.11 | 0.25 | 0.075 | 1 | 1 | 0.02855138 | |
BF175258_A_B_dos_1_____GP5C1_Inj 63 | 100317 | Thiobencarb | DTXSID6024337 | BF00175258 | D0 | 4 | 1 | ISTD Name | 1 | 280,232 | 318,169 | GC | Agilent.GCMS | Unknown | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | 2 | AtoB | 10 | 10 | 0.11 | 0.25 | 0.075 | 1 | 1 | 1.13537712 |
sample_verification
is the Level 2 function used to add
a verification column. The verification column indicates whether a
sample should be included in the point estimation (Level 3) processing.
This column allows users to keep all samples in their data but only
utilize the reliable samples for Papp estimation. All of the
data in Level 2 is identical to the data in Level 1 with the exception
of the additional Verified
column.
To determine whether a sample should be included, the user should consult the wet-lab scientists from where their data originates or a chemist who may be able to provide reliable rationale for samples that should not be verified. This level of processing allows the user to receive feedback, exclude erroneous or unreliable samples, and produce new Papp estimates. Thus, there is always an open channel of communication between the user and the wet-lab scientists or chemists.
We will use the already processed Level 2 data frame,
caco2_L2
, to regenerate our exclusion data. Note, all of
our samples are verified but we are explaining how to create an
exclusion list for learning purposes. In general, the user would not
have access to the exclusion information a priori.
The exclusion data frame must include the following columns:
Variables
, Values
, and Message
.
The Variables
column contains the variable names used to
filter the excluded rows. Here, we are using
Lab.Sample.Name
and DTXSID
to identify the
excluded rows separated by a “|”. The Values
column
contains the values of the variables, as a character, also separated by
a “|”. The Message
column contains the reason for
exclusion. Here, we are using the reasons listed in the
Verified
column in caco2_L2
. The user should
refrain from using “|” in any of their descriptions to avoid conflicts
with the sample_verification
function.
# Use verification data from loaded in `caco2_L2` data frame
exclusion <- caco2_L2 %>%
filter(Verified != "Y") %>%
mutate("Variables" = "Lab.Sample.Name|DTXSID") %>%
mutate("Values" = paste(Lab.Sample.Name, DTXSID, sep = "|")) %>%
mutate("Message" = Verified) %>%
select(Variables, Values, Message)
Variables | Values | Message |
---|
As expected, our exclusion data frame is empty because all of our
samples are verified. If all of the user’s samples are verified, they
simply do not provide an exclusion.info
data frame in
sample_verification
.
caco2_L2_curated <- sample_verification(FILENAME = "Caco2_vignette",
data.in = caco2_L1_curated,
assay = "Caco-2",
# don't export output TSV file
output.res = FALSE)
Our Level 2 data now contains a Verified
column. If the
sample should be included, the column contains a “Y” for yes. If the
sample should be excluded, the column contains the reason for
exclusion.
The following table displays some rows of the Level 2 data.
Lab.Sample.Name | Date | Compound.Name | DTXSID | Lab.Compound.Name | Sample.Type | Dilution.Factor | Calibration | ISTD.Name | ISTD.Conc | ISTD.Area | Area | Analysis.Method | Analysis.Instrument | Analysis.Parameters | Note | Level0.File | Level0.Sheet | Time | Direction | Test.Compound.Conc | Test.Nominal.Conc | Membrane.Area | Vol.Receiver | Vol.Donor | Biological.Replicates | Technical.Replicates | Response | Verified |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Blank 1_TB_A_B_1_____QP1F8_Inj 5 | 100317 | Thiobencarb | DTXSID6024337 | BF00175258 | Blank | 1 | 1 | ISTD Name | 1 | 0 | 0 | GC | Agilent.GCMS | Unknown | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | 2 | AtoB | 10 | 10 | 0.11 | 0.250 | 0.075 | 1 | 1 | 0 | Y | |
Blank 1_TB_A_B_2_____QP1F8_Inj 6 | 100317 | Thiobencarb | DTXSID6024337 | BF00175258 | Blank | 1 | 1 | ISTD Name | 1 | 0 | 0 | GC | Agilent.GCMS | Unknown | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | 2 | AtoB | 10 | 10 | 0.11 | 0.250 | 0.075 | 1 | 1 | 0 | Y | |
Blank 1_TB_B_A_3_____QP1F8_Inj 3 | 100317 | Thiobencarb | DTXSID6024337 | BF00175258 | Blank | 1 | 1 | ISTD Name | 1 | 0 | 0 | GC | Agilent.GCMS | Unknown | Edited_EPA_Task 10_13_Caco-2 Compiled_LCMSGC_10032017_Data Summary_GZ.xlsm | Raw Data | 2 | BtoA | 10 | 10 | 0.11 | 0.075 | 0.250 | 1 | 1 | 0 | Y |
calc_caco2_point
is the Level 3 function used to
calculate the Papp point estimate for each test compound
using a Frequentist framework.
Mathematically, Papp is the amount of compound transported per unit time and is defined as \[P_{app} = \frac{dQ/dt}{c_0*A}\] where \(dQ/dt\) is the rate of permeation, \(c_0\) is the initial concentration in the donor compartment, and \(A\) is the surface area of the cell monolayer.
First, we define the rate of permeation as the amount of compound passing through the monolayer per unit time. It is expressed as \[\frac{dQ}{dt} = \frac{c_{\textrm{receiver}}V_{\textrm{receiver}}}{\Delta t}\] where \(c_{\textrm{receiver}}\) is the concentration of the test compound in the receiver compartment, \(V_{\textrm{receiver}}\) is the volume of the receiver compartment, and \(\Delta t\) is the total elapsed time. This rate is a type of flux and therefore has units of \(\mu \textrm{mol}/s\). The dimensional analysis is described below. \[\left[\frac{dQ}{dt}\right] = \frac{[c_{\textrm{receiver}}][V_{\textrm{receiver}}]}{[\Delta t]} = \frac{[\mu \textrm{mol/L}][\textrm{L]}}{[s]} = \frac{[\mu \textrm{mol}]}{[s]} \]
Next, to estimate \(c_\textrm{receiver}\), we first multiply each R2 response with the corresponding R2 dilution factor and estimate the mean. To account for background noise, we follow a similar procedure with the blank response and dilution factor. We then subtract the two such that \[c_\textrm{receiver} = \textrm{mean}(\textrm{dilution factor}_{R2} * \textrm{Response}_{R2}) - \textrm{mean}(\textrm{dilution factor}_{Blank} * \textrm{Response}_{Blank})\]
The initial concentration in the donor compartment, \(c_0\), is calculated similarly but with the D0 response instead of the R2 response.
The last remaining arguments are estimated with columns in our data:
\(V_\textrm{receiver}\) is defined by
our Vol.Receiver
column, \(\Delta
t\) is defined by our Time
column, and \(A\) is defined by our
Membrane.Area
column.
With this information, we calculate Papp in the donor to receiver direction. The estimate’s direction is annotated according to Figure 1. For example, results annotated with the “A2B” suffix denote the apical side as the donor and the basolateral side as the receiver. Contrastingly, results annotated with the “B2A” suffix denote the basolateral side as the donor and the apical side as the receiver.
caco2_L3_curated <- calc_caco2_point(FILENAME = "Caco2_vignette",
data.in = caco2_L2_curated,
# don't export output TSV file
output.res = FALSE
)
#> [1] "Thiobencarb Refflux = 0.683"
#> [1] "Nitrapyrin Refflux = 1.08"
#> [1] "4-Chloro-2-methylaniline Refflux = 0.777"
#> [1] "Apical to basolateral permeability calculated for 3 chemicals."
#> [1] "Basolateral to apical permeability calculated for 3 chemicals."
#> [1] "Efflux ratio calculated for 3 chemicals."
Compound.Name | DTXSID | Time | Membrane.Area | Calibration | C0_A2B | dQdt_A2B | Papp_A2B | Frec_A2B.vec | Frec_A2B.mean | Recovery_Class_A2B.vec | Recovery_Class_A2B.mean | C0_B2A | dQdt_B2A | Papp_B2A | Frec_B2A.vec | Frec_B2A.mean | Recovery_Class_B2A.vec | Recovery_Class_B2A.mean | Refflux |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Thiobencarb | DTXSID6024337 | 2 | 0.11 | All Data | 4.3791963 | 0.000003583047 | 7.438157 | 0.185121268169732|0.247262378175506 | 0.2161918 | Low Recovery|Low Recovery | Low Recovery | 9.7077470 | 0.000005428904 | 5.083947 | 0.170999686957327|0.223500162571215 | 0.1972499 | Low Recovery|Low Recovery | Low Recovery | 0.6834955 |
Nitrapyrin | DTXSID0024216 | 2 | 0.11 | All Data | 0.7310252 | 0.000001094650 | 13.612891 | 0.278794672635322|0.269665209932159 | 0.2742299 | Low Recovery|Low Recovery | Low Recovery | 0.7179703 | 0.000001162871 | 14.724225 | 0.322876826680762|0.291837993091206 | 0.3073574 | Low Recovery|Low Recovery | Low Recovery | 1.0816384 |
4-Chloro-2-methylaniline | DTXSID1041508 | 2 | 0.11 | All Data | 6.8388783 | 0.000021289562 | 28.300177 | 0.608500272639059|0.581745561840612 | 0.5951229 | 6.8960900 | 0.000016679138 | 21.987608 | 0.57300966864841|0.588685286412322 | 0.5808475 | 0.7769424 |
In addition to returning an estimate for Papp
and its
components (C0
, dQdt
) in both directions, our
Level 3 data contains estimates for the efflux ratio
(Refflux
), the fraction recovered (Frec
) in
both directions, and the qualitative category of each fraction recovered
value (Recovery_Class
).
The efflux ratio is the ratio between the apparent membrane permeabilities and is expressed as \[R_{\textrm{efflux}} = \frac{P_{app}^{B2A}}{P_{app}^{A2B}}\]
The fraction recovered is the fraction of the initial donor amount recovered in the receiver compartment. It assesses the accuracy of the analytical chemistry method used to determine concentration values. It is expressed as \[ \frac{V_{D2}*\textrm{DF}_{D2}*(Responses_{D2} - \overline{Response}_{Blank}) + V_{R2}*\textrm{DF}_{R2}*(Responses_{R2} - \overline{Response}_{Blank})}{V_{D0}*\textrm{DF}_{D0}*(Responses_{D0} - \overline{Response}_{Blank})} \] where \(V\) is the corresponding volume, \(\textrm{DF}\) is the corresponding dilution factor, \(Responses\) are the corresponding responses, and \(\overline{Response}_{Blank}\) is the mean blank response.
Fraction recovered values are then classified and provided as the
Recovery_Class
value. Values \(<0.4\) are classified as “Low Recovery”
and values \(>2\) are classified as
“High Recovery”. Values within the range, \(0.4 \leq \textrm{Frec} \leq 2\), are
considered normal and are not given an explicit classification.
If there are multiple samples per sample type per chemical, (i.e. biological replicates), a vector of fraction recovered values will be returned with a “|” separating each value. Similarly, a vector of recovery classifications will be returned for each value in the vector of fraction recovered values with a “|” separating each value.
The following columns are always returned involving the fraction recovered in the A to B direction:
Frec_A2B.vec
- vector of fraction recovered valuesRecovery_Class_A2B.vec
- vector of recovery
classificationsFrec_A2B.mean
- mean of fraction recovered valuesRecovery_Class_A2B.mean
- recovery classification of
mean fraction recovered valueNote that even if there are no biological replicates, the above four
columns are still returned; all of the Frec
estimates will
just be equivalent. Estimates are also returned in the B to A
direction.
Looking at our results, some of our Frec_A2B.vec
and
Frec_B2A.vec
values are less than \(0.4\) and are therefore classified as “Low
Recovery”. Others are within the \([0.4,
2]\) range and are therefore considered normal.
Generally, data processing pipelines should include minimal to no manual coding. It is best to keep clean code that is easily reproducible and transferable. The user should aim to have all the required data and meta-data files properly formatted to avoid further modifications throughout the pipeline.