## For performance reasons, this vignette builds from pre-calculated data.
## 
##  To run all calculations, set 'do.calc <- TRUE' in the vignette's first code chunk. 
##  Building the vignette will take a while.The GUTS package provides GUTS-RED variants GUTS-RED-IT, GUTS-RED-SD and GUTS-RED-Proper. For further information see Jager et al. (2011), Ashauer et al. (2016), Jager & Ashauer (2018) and EFSA PPR Panel (2018). The R-package implementation follows Albert et al. (2016), with minor enhancements.
This vignette demonstrates application of the individual tolerance model GUTS-RED-IT and the stochastic death model GUTS-RED-SD. The vignette is designed as verification test of model results. For this purpose, it runs one part of a recently conducted ring test on GUTS-software.
The ring test compared results from several software implementations of GUTS (Jager & Ashauer, 2018, Ch. 7). It focused on GUTS-RED-IT and GUTS-RED-SD models. Here, we follow the protocol of ring-test A, and perform the calibration as well as a forecast analysis.
Load the GUTS-package.
The ring test data was downloaded from http://www.debtox.info/book_guts.html and the data sheet with ring test A data is stored in file “Data_for_GUTS_software_ring_test_A_v05.xlsx” in the GUTS package. We recommend opening the file for data inspection.
This theoretical study assumed a chronic test with abundance measurements taken at exposure times. The synthetic data sets were generated with a GUTS-SD and a GUTS-IT model, using predefined parameters (Jager & Ashauer, 2018, Ch. 7.1.2, tab. 7.1).
Parameter values are:
par_A <- data.frame(symbols = c("hb", "ke", "kk", "mn", "beta"), JAsymbols = c("hb", "kd", "kk", "mw", "beta"), SD = c(0.01, 0.8, 0.6, 3, NA), IT = c(0.02, 0.8, NA, 5, 5.3))
par_A
#>   symbols JAsymbols   SD   IT
#> 1      hb        hb 0.01 0.02
#> 2      ke        kd 0.80 0.80
#> 3      kk        kk 0.60   NA
#> 4      mn        mw 3.00 5.00
#> 5    beta      beta   NA 5.30Symbols refer to parameter symbols used in the GUTS package. JAsymbols denote the symbols used in Jager & Ashauer (2018).
The ring test data set A-SD is read from file “Data_for_GUTS_software_ring_test_A_v05.xlsx”.
The respective data table is in wide format. Here is the display as an R data.frame:
library(xlsx)
read.xlsx(
  file = paste0(JA_data_file_name),
  sheetName = "Data A",
  rowIndex = c(1:11),
  colIndex = seq(which(LETTERS == "A"), which(LETTERS == "G")),
  header = TRUE
  )
#>    Table.1..Synthetic.survival.data.for.calibration.of.GUTS.SIC.SD.
#> 1                                                              <NA>
#> 2                                                              Time
#> 3                                                               Day
#> 4                                                                 0
#> 5                                                                 1
#> 6                                                                 2
#> 7                                                                 3
#> 8                                                                 4
#> 9                                                                 5
#> 10                                                                6
#>                              NA. NA..1 NA..2 NA..3 NA..4 NA..5
#> 1      Number of alive organisms    NA    NA    NA    NA    NA
#> 2  Concentration in micromol / L    NA    NA    NA    NA    NA
#> 3                              0     2     4     6     8    16
#> 4                             20    20    20    20    20    20
#> 5                             20    20    20    20    18     5
#> 6                             20    20    19    11     4     0
#> 7                             20    20    15     2     1     0
#> 8                             19    20    11     0     0     0
#> 9                             19    20     5     0     0     0
#> 10                            18    20     3     0     0     0Extraction of the relevant data:
data_A_SD <- read.xlsx(
  file = paste0(JA_data_file_name),
  sheetName = "Data A",
  rowIndex = c(4:11),
  colIndex = seq(which(LETTERS == "B"), which(LETTERS == "G")),
  header = FALSE
  )
con_A_SD <- as.numeric(data_A_SD[1,])
data_A_SD <- data_A_SD[-1,] 
#the first row contains the concentrations
# all subsequent rows: number of alive organisms at specific day.
day_A_SD <-
  as.numeric(
    t(
      read.xlsx(
        file = paste0(JA_data_file_name),
        sheetName = "Data A",
        rowIndex = c(5:11),
        colIndex = seq(which(LETTERS == "A")),
        header = FALSE
      )
    )
  )
# name the data.frame
names(data_A_SD) <- paste0("c", con_A_SD)
rownames(data_A_SD) <- paste0("d", day_A_SD)c: applied concentration; d: treatment day;
each value indicates the number of alive organisms
Setting up a GUTS-SD-object requires for each replicate of the treatment groups:
C at each concentration measurement time
Cty at each abundance measurement time ytmodel = "SD"For each replicate one GUTS object is created. All GUTS-objects are collected in a list. For the ring-test with one replicate for each of the 6 treatment groups, a list of 6 guts-objects is generated. We explicitly demonstrate how each GUTS-object in the list is constructed:
GUTS_A_SD <- list( 
  C0 = guts_setup(
    C = rep_len(con_A_SD[1], length(day_A_SD)), Ct = day_A_SD,
    y = data_A_SD$c0, yt = day_A_SD,
    model = "SD"
    ),
  C2 = guts_setup(
    C = rep_len(con_A_SD[2], length(day_A_SD)), Ct = day_A_SD,
    y = data_A_SD$c2, yt = day_A_SD,
    model = "SD"
    ),
  C4 = guts_setup(
    C = rep_len(con_A_SD[3], length(day_A_SD)), Ct = day_A_SD,
    y = data_A_SD$c4, yt = day_A_SD,
    model = "SD"
    ),
  C6 = guts_setup(
    C = rep_len(con_A_SD[4], length(day_A_SD)), Ct = day_A_SD,
    y = data_A_SD$c6, yt = day_A_SD,
    model = "SD"
    ),
  C8 = guts_setup(
    C = rep_len(con_A_SD[5], length(day_A_SD)), Ct = day_A_SD,
    y = data_A_SD$c8, yt = day_A_SD,
    model = "SD"
    ),
  C16 = guts_setup(
    C = rep_len(con_A_SD[6], length(day_A_SD)), Ct = day_A_SD,
    y = data_A_SD$c16, yt = day_A_SD,
    model = "SD"
    )
)Parameter estimation is conducted following suggestions Albert et al. (2016) and Supplementary material “S1: GUTS example R script”. For demonstration purposes in this vignette, the calibration procedure has been simplified and adjusted to the ring-test data set.
The list of GUTS-objects is used to calculate the joint likelihood.
The logposterior function is formulated with a function that defines parameter bounds.
Constraints on parameters should consider
is.infinite(p) constrains parameters to finite
valuesp<0 confines rates and thresholds to positive
valuesp["kk"] > 30 confines the killing rate to avoid
divergent Markov chains (see Albert et al., 2016).Parameter values are estimated applying an adaptive Markov Chain Monte Carlo (MCMC) algorithm.
pars_start_SD <- rep_len (0.5, 4)
names(pars_start_SD) <- par_A$JAsymbols[-which(is.na(par_A$SD))]
mcmc_result_SD <- MCMC(p = logposterior, 
  init = pars_start_SD, adapt = 5000, acc.rate = 0.4, n = 150000, 
  guts_objects = GUTS_A_SD, 
  isOutOfBoundsFun = is_out_of_bounds_fun_SD
)
#exclude burnin and thin by 3
mcmc_result_SD$samples <- mcmc_result_SD$samples[seq(50001, 150000, by = 20),]
mcmc_result_SD$log.p <- mcmc_result_SD$log.p[seq(50001, 150000, by = 20)]The top four graphs show mixing and distribution of the parameters. The last row “LL” shows mixing and distribution of the logposterior.
if (all(is.finite(mcmc_result_SD$log.p))) {
  par( mfrow = c(dim(mcmc_result_SD$samples)[2] + 1, 2) , mar = c(5,4,1,0.5))
  plot(as.mcmc(cbind(mcmc_result_SD$samples, LL = mcmc_result_SD$log.p)), auto.layout = FALSE)
  par(op)
} else {
  par( mfrow = c(dim(mcmc_result_SD$samples)[2], 2) , mar = c(5,4,1,0.5))
  plot(as.mcmc(mcmc_result_SD$samples), auto.layout = FALSE)
  par(op)
}This function takes a calibrated MCMC sample and summarizes the posterior distributions of the calibrated parameters. Calculated are for each calibrated parameter the value with highest posterior, as well as the median, the 2.5% and 97.5% quantiles of the posterior distribution. With the optional parameter expectedVal original values from the ring test can be given to compare them with the calibrated values. A graphical summary is plotted, if plot = TRUE.
eval_MCMC <- function(sampMCMC, expectedVal = NULL, plot = TRUE) {
  bestFit <- sampMCMC$samples[which.max(sampMCMC$log.p),]
  qu <- apply(sampMCMC$samples, 2, quantile, probs = c(0.025, 0.5, 0.975))
  if (plot) {
    if(is.null(expectedVal)) expectedVal <- rep(NA, dim(sampMCMC$samples)[2])
    plot(seq(dim(sampMCMC$samples)[2]), expectedVal, pch = 20, col = "darkgrey", cex = 2, ylim = range(qu), 
      xaxt = "n", xlab = "Model parameter", ylab = "Parameter value")
    arrows(x0 = seq(dim(sampMCMC$samples)[2]), y0 = qu[1,], y1 = qu[3,], angle = 90, length = 0.1, code = 3)
    points(x = seq(dim(sampMCMC$samples)[2]), y = bestFit, pch = "-", cex = 4)
    axis(side = 1, at = seq(dim(sampMCMC$samples)[2]), dimnames(sampMCMC$samples)[[2]])
  }
  res <- rbind(bestFit, qu)
  rownames(res)[1] <- "best"
  if (!all(is.na(expectedVal))) {
    res <- rbind(res, expectedVal)
    rownames(res)[dim(res)[1]] <- "expect"
  }
  return(res)
}#>                 hb        kd        kk       mw
#> best   0.007747013 0.6914421 0.6172954 2.831710
#> 2.5%   0.002763125 0.4888775 0.4246350 2.309847
#> 50%    0.011203457 0.6894394 0.6710864 2.927733
#> 97.5%  0.028080870 0.9656203 1.0542871 3.334637
#> expect 0.010000000 0.8000000 0.6000000 3.000000Black lines indicate best estimates and the error bars the 95% credible interval; grey dots are the model parameter values used to generate the data for the ring test.
The ring test data set A-IT is read from file “Data_for_GUTS_software_ring_test_A_v05.xlsx”.
Extraction of the relevant data:
data_A_IT <- read.xlsx(
  file = paste0(JA_data_file_name),
  sheetName = "Data A",
  rowIndex = c(17:24),
  colIndex = seq(which(LETTERS == "B"), which(LETTERS == "G")),
  header = FALSE
  )
con_A_IT <- as.numeric(data_A_IT[1,])
data_A_IT <- data_A_IT[-1,] 
#the first row contains the concentrations
# all subsequent rows: number of alive organisms at specific day.
day_A_IT <-
  as.numeric(
    t(
      read.xlsx(
        file = paste0(JA_data_file_name),
        sheetName = "Data A",
        rowIndex = c(18:24),
        colIndex = seq(which(LETTERS == "A")),
        header = FALSE
      )
    )
  )
# name the data.frame
names(data_A_IT) <- paste0("c", con_A_IT)
rownames(data_A_IT) <- paste0("d", day_A_IT)c: applied concentration; d: treatment day;
each value indicates the number of alive organisms
Setting up a GUTS-IT-object requires for each replicate of the treatment groups:
C at each concentration measurement time
Cty at each abundance measurement time ytmodel = "IT"dist = "loglogistic", as log-logistic was used in the ring
test.For each replicate one GUTS object is created. All GUTS-objects are collected in a list. For the ring-test with one replicate for each of the 6 treatment groups, a list of 6 guts-objects is generated. Here, we exemplarily show an automatized procedure to create the list of GUTS-objects from the data.
The logposterior function is formulated with a function that defines parameter bounds.
Constraints on parameters should consider
is.infinite(p) constrains parameters to finite
valuesp<0 confines rates and thresholds to positive
valuesexp(8/p[4]) * p[3] > 1e200 avoids numerical problems
when approximating the log-logistic function for unrealistically high
median values p[3] and extremely low shape values
p[4] that result in a threshold distribution confined at a
zero threshold.p[4] <= 1 is assumed, to exclude that the mode of
the log-logistic threshold distribution is 0, due to the shape
parameter. Note that the mode can approach 0 if the estimated median
p[3] approaches 0. Therefore, this constraint stabilises
estimation of low threshold values.Parameter values are estimated applying an adaptive Markov Chain Monte Carlo (MCMC) algorithm.
pars_start_IT <- rep_len(0.5, 4)
names(pars_start_IT) <- par_A$JAsymbols[-which(is.na(par_A$IT))]
mcmc_result_IT <- MCMC(p = logposterior, 
  init = pars_start_IT, adapt = 5000, acc.rate = 0.4, n = 150000, 
  guts_objects = GUTS_A_IT, isOutOfBoundsFun = is_out_of_bounds_fun_IT
)
#exclude burnin and thin by 3
mcmc_result_IT$samples <- mcmc_result_IT$samples[seq(50001, 150000, by = 20), ]
mcmc_result_IT$log.p <- mcmc_result_IT$log.p[seq(50001, 150000, by = 20)]The top four graphs show mixing and distribution of the parameters. The last row “LL” shows mixing and distribution of the logposterior.
if (all(is.finite(mcmc_result_IT$log.p))) {
  par( mfrow = c(dim(mcmc_result_IT$samples)[2] + 1, 2) , mar = c(5,4,1,0.5))
  plot(as.mcmc(cbind(mcmc_result_IT$samples, LL = mcmc_result_IT$log.p)), auto.layout = FALSE)
  par(op)
} else {
  par( mfrow = c(dim(mcmc_result_IT$samples)[2], 2) , mar = c(5,4,1,0.5))
  plot(as.mcmc(mcmc_result_IT$samples), auto.layout = FALSE)
  par(op)
}#>                hb        kd       mw     beta
#> best   0.02707007 0.8127029 5.448312 5.168084
#> 2.5%   0.01308660 0.5975469 4.691963 3.819837
#> 50%    0.03115691 0.8481939 5.631914 5.470063
#> 97.5%  0.05698634 1.1869923 6.611356 7.935804
#> expect 0.02000000 0.8000000 5.000000 5.300000Black lines indicate best estimates and the error bars the 95% credible interval; grey dots are the model parameter values used to generate the data for the ring test.
The GUTS-package can be applied to forecast survival, too. We demonstrate this feature for the calculation of a 4d-LC50 value assuming constant exposure.
Values to be specified are
CCty
length(y) corresponds to the number of time steps for
which predictions are required.y[1] specifies the initial number of individuals.y can be chosen arbitrarily and will
not be used.yt: a vector starting with 0To calculate the LC50, we use the GUTS object to simulate the survival during a 4 day period for several constant dose levels. Here, we uses doses 0 to 16 in steps of 2.
We choose 6 observation and application time steps for illustration purposes. In fact, to calculate 4d-LC50, it would be sufficient to specify the initial number of individuals at time step 0, and specify time step 4 days as the observation time. We assume an initial population size of 100 individuals.
conc <- seq(0, 16, by = 2)
guts_obj_forecast <- lapply(conc,
  function(concentration) guts_setup(
    C = rep(concentration, 7),
    Ct = seq(0,12, by = 2),
    y = c(100, rep(0,6)),
    yt = seq(0,12, by = 2),
    model = "IT", dist = "loglogistic", N = 1000
  )
)We forecast survival probabilities for each dose concentration and the parameter sets in the posterior distribution. This procedure takes into account uncertainties in parameter estimates.
According to the protocol for the ring test, for prediction the background mortality “hb” was fixed at 0, in order to isolate the treatment effects in model forecasts.
forec <- lapply(guts_obj_forecast,
  function(gobj, mcmc_res) 
    rbind(
      rep(gobj$C[1], dim(mcmc_forecasts_paras)[1]), 
      apply(mcmc_res, 1,
        function(pars) guts_calc_survivalprobs(gobj = gobj, pars, external_dist = NULL)
      )
    ),
  mcmc_res = mcmc_forecasts_paras
)
forec <- do.call("cbind", forec)
forec <- as.data.frame(t(forec))
names(forec) <- c("conc", paste0("day", guts_obj_forecast[[1]]$yt))The box-whisker-plots show dose-response relationships as forecasted for the specific days.
par(mfrow = c(2,3), mar = c(5,4,3, 0.5))
sapply(tail(names(forec), -2),
             function(day)
                plot(as.factor(forec$conc), forec[, day], 
                         ylim = c(0,1),
                         xlab = "concentration (micromol/l)", ylab = "probability of survival", 
                         main = day)
)For example, the drc-package (Ritz et al. 2015) can be applied to estimate the d4-LD50. A log-logistic dose-response interpolation is assumed. By estimating the LD50 for each sampled parameter set separately, the uncertainty in LC50 estimates can be derived from the parameter uncertainty revealed by the Bayesian analysis.
library("drc")
logLC50s <- sapply(seq_len(dim(mcmc_forecasts_paras)[1]), 
  function(indParaset, forDat, concentrations) {
    dat <- forDat[dim(mcmc_forecasts_paras)[1] * (seq_along(concentrations) - 1) + indParaset,"day4"]
    return(
      coefficients(
        drm(data.frame(dat, concentrations), fct = LL2.3(names = c("Slope", "upper", "logLC50")))
      )[3]
    )
  },
  forDat <- forec,
  concentrations = conc
)The estimated LC50 median of 5.8 (CI = [5; 6.7]) is in accordance with the ring test forecast (Fig. 7.5, Data-A forecast 4d-LC50-IT in Jager & Ashauer, 2018).
Albert, C., Vogel, S., and Ashauer, R. (2016). Computationally efficient implementation of a novel algorithm for the General Unified Threshold Model of Survival (GUTS). PLOS Computational Biology, 12(6), e1004978. doi: 10.1371/journal.pcbi.1004978.
Ashauer, R., Albert, C., Augustine, S., Cedergreen, N., Charles, S., Ducrot, V., Focks, A., Gabsi, F., Gergs, A., Goussen, B., Jager, T., Kramer, N.I., Nyman, A.-M., Poulsen, V., Reichenberger, S., Schäfer, R.B., Van den Brink, P.J., Veltman, K., Vogel, S., Zimmer, E.I., Preuss, T.G. (2016). Modelling survival: exposure pattern, species sensitivity and uncertainty. Scientific Reports, 6, 1, doi: 10.1038/srep29178
Jager, T., Albert, C., Preuss, T., and Ashauer, R. (2011). General Unified Threshold Model of Survival - a toxicokinetic toxicodynamic framework for ecotoxicology. Environmental Science & Technology, 45(7), 2529-2540, doi: 10.1021/es103092a.
Jager, T., Ashauer, R. (2018). Modelling survival under chemical stress. A comprehensive guide to the GUTS framework. Leanpub: https://leanpub.com/guts_book, http://www.debtox.info/book_guts.html.
EFSA PPR Panel (EFSA Panel on Plant Protection Products and their Residues), Ockleford, C., Adriaanse, P., Berny, P., Brock, T., Duquesne, S., Grilli, S., Hernandez-Jerez, A.F., Bennekou, S.H., Klein, M., Kuhl, T., Laskowski, R., Machera, K., Pelkonen, O., Pieper, S., Smith, R.H., Stemmer, M., Sundh, I., Tiktak, A., Topping, C.J., Wolterink, G., Cedergreen, N., Charles, S., Focks, A., Reed, M., Arena, M., Ippolito, A., Byers, H. and Teodorovic, I. (2018). Scientific Opinion on the state of the art of Toxicokinetic/Toxicodynamic (TKTD) effect models for regulatory risk assessment of pesticides for aquatic organisms. EFSA Journal, 16(8):5377, 188 pp., doi: 10.2903/j.efsa.2018.5377.
Ritz, C., Baty, F., Streibig, J. C., Gerhard, D. (2015). Dose-Response Analysis Using R. PLOS ONE, 10(12), e0146021, doi: 10.1371/journal.pone.0146021