The purpose of this vignette is to share the code used to provide a
walkthrough of the BayesECM() function in the
ezECM package, as well as provide code to reproduce a
figure from a related future publication. This document outlines the
skeleton of a Monte Carlo experiment and provides the relevant code.
Some parameter values are changed to reduce the computation time, the
original values are noted in the text. Synthetic data is generated to
use for testing and training different implementations of the Event
Categorization Matrix (ECM) model for comparision. The comparators are
classical ECM (C-ECM), Bayesian ECM (B-ECM) only trained on events where
all discriminants are available, B-ECM where the model is trained on all
available data including partial observations (M-B-ECM), and M-B-ECM
where the loss matrix is changed such that the false negative rate is
reduced. All of these comparators focus on binary categorization, simply
detecting if a new observation belongs to a pre-specified important
category. The last comparator categorizes new events into each of the
K event categories used for training with the B-ECM model
(B-ECM Cat). The current form of C-ECM cannot utilize partial
observations for training, and therefore we hypothesize that the
performance of C-ECM will suffer in comparision to B-ECM models which
can utilize observations with missing data for training.
First, the ezECM package must be loaded.
We will define some functions used to generate the synthetic data,
which are not part of the ezECM package. These functions
randomly generate a mean and covariance for each class. These random
mean and covariance is then used to generate random data sets.
Additionally, functions are used for randomly deleting data to generate
partial observations. Details are provided in the comments, as well as
the publication. The argument p specifies the number of
discriminants, K changes the number of event categories,
Ntest is the size of the testing set, Ntrain
is the size of the training set which does not have any missing data,
Ntrain_missing is the size of the training set where events
have at least one missing discriminant, tst_missing
controls the fraction of missing entries in the testing set, and
accordingly trn_missing controls the fraction of the
training set which is missing. In the following experiment we will only
vary p.
data_gen <- function(p = NULL, K = 3, Ntest = NULL, Ntrain = NULL, 
                     Ntrain_missing = NULL, tst_missing = NULL, trn_missing = NULL){
  
    mu_use <- matrix(rnorm(n = p*K, sd = 0.5), ncol = K, nrow = p)
    Y <- list()
    S <- list()
    
    ## random number in each class
    
    N <- LaplacesDemon::rcat(n = Ntest + Ntrain + Ntrain_missing, 
                             p = rep(1/3, times = K))
    N <- as.vector(table(N))
    
    ## random very important category
    
    vic <- sample(1:K, size = 1)
    
    for(k in 1:K){
      
      ## random number of blocks in kth category
      
      nblocks <- sample(1:2, size = 1)
      
      ## random covariance matrix for kth category
      
      S[[k]] <- LaplacesDemon::rinvwishart(nu = p + 4, S = diag(p))
      
      
      ## If relevant, delete entries to form block covariance matrices of random sizes
      
      if(nblocks == 2){
        nblock1 <- sample(1:(p-1), size = 1)
        block1_members <- sample(1:p, size = nblock1, replace = FALSE)
        block2_members <- (1:p)[-block1_members]
        
        zero_elements <- expand.grid(block1_members, block2_members)
        
        S[[k]][zero_elements$Var1, zero_elements$Var2] <- 0
        S[[k]][zero_elements$Var2, zero_elements$Var1] <- 0
        
      }
      
      ## data for kth class is drawn from a MVN, given the mean and covariance
      Y[[k]] <- as.data.frame(LaplacesDemon::rmvn(n = N[k], mu = mu_use[,k], Sigma= S[[k]]))
      
      ## data is transformed to (0,1) using the logistic function to run a check
      Ytemp<- 1/(1+ exp(-Y[[k]]))
      
      ## if machine precision causes the output of the logistic function to round to 1, the experiment is stopped
      ## this has never happened
      if(max(apply(Ytemp,2,function(X){range(X)})) == 1){
        stop()
      }else{
        Y[[k]]<- 1/(1+ exp(-Y[[k]]))
      }
      
      ## a column for the event class is appended to the data.frame
      Y[[k]] <- cbind(Y[[k]], as.character(k))
      names(Y[[k]]) <- c(paste("p",as.character(1:p),  sep = ""), "event")
      
    }
    
    Y <- do.call("rbind", Y)
    
    ## A random sample of the data is taken to be the testing set
    test_index <- sample(1:nrow(Y), size = Ntest)
    testing <- Y[test_index,]
    
    ## The remainder is slated for training
    training <- Y[-test_index, ]
    
    ## A random sample of the training set is set aside to have missing entries,
    ## the remainder is set aside to be the fully observed training set
    train_full_index <- sample(1:nrow(training), size = Ntrain)
    train_missing <- training[-train_full_index, ]
    train_full <- training[train_full_index, ]
    
    ## If all event categories are not represented in the training set of full observations,
    ## the sampling scheme repeats
    if(any(table(train_full$event) <= 1) | length(table(train_full$event)) <= (K-1)){
      while(any(table(train_full$event) <= 1 | length(table(train_full$event)) <= (K-1))){
        test_index <- sample(1:nrow(Y), size = Ntest)
        
        testing <- Y[test_index,]
        training <- Y[-test_index, ]
        train_full_index <- sample(1:nrow(training), size = Ntrain)
        train_missing <- training[-train_full_index, ]
        train_full <- training[train_full_index, ]
      }
    }
    
    ## the true event category for the testing set is saved as a seperate variable,
    ## and deleted from the data.frame containing the data
    test_truth <- testing$event
    testing$event <- NULL
    
    ## Entries are randomly selected for deletion from the testing set.
    ## This scheme ensures a single discriminant for each observation
    ## not deleted in order to reduce computation time.
    
    abs_present <- sample(size = Ntest, 1:p, replace = TRUE)
    missing_pool <- matrix(1:p, ncol = p, nrow = Ntest, byrow = TRUE)
    
    missing_pool <- t(apply(cbind(missing_pool, abs_present), 1, function(X,pp){
      X[-c(X[pp + 1], pp +1)]
    }, pp = p))
    
    missing_pool_save <- missing_pool
    frac_missing <- (p*tst_missing)/(p-1)
    
    # sample which of the remaining elements will be missing
    missing_sample <- sample(1:(nrow(missing_pool)*ncol(missing_pool)), 
                             size = floor(nrow(missing_pool)*ncol(missing_pool)*(frac_missing)), 
                             replace = FALSE)
    
    missing_pool_save[missing_sample] <- NA
    
    saved_data <- apply(cbind(missing_pool_save, unname(abs_present)), 1, function(X){
      X[-which(is.na(X))]
    })
    
    for(j in 1:nrow(testing)){
      testing[j,-c(saved_data[[j]])] <- NA
    }
    
    ## Entries are randomly selected for deletion from the training set.
    ## This scheme ensures a single discriminant for each observation
    ## not deleted in order to reduce computation time.
    
    abs_present <- sample(size = Ntrain_missing, 1:p, replace = TRUE)
    abs_missing <- matrix(1:p, ncol = p, nrow = Ntrain_missing, byrow = TRUE)
    
    abs_missing <- t(apply(cbind(abs_missing, abs_present), 1, function(X,pp){
      X[-c(X[pp + 1], pp +1)]
    }, pp = p))
    
    abs_missing <- apply(abs_missing, 1, function(X){
      sample(X, size = 1)
    })
    
    missing_pool <- matrix(1:p, ncol = p, nrow = Ntrain_missing, byrow = TRUE)
    
    missing_pool <- t(apply(cbind(missing_pool, abs_present, abs_missing), 1, function(X,pp){
      X[-c(X[pp + 1], X[pp + 2], pp +1, pp + 2)]
    }, pp = p))
    
    missing_pool_save <- missing_pool
    frac_missing <- (p*trn_missing - 1)/(p-2)
    
    # sample which of the remaining elements will be missing
    missing_sample <- sample(1:(nrow(missing_pool)*ncol(missing_pool)), size = floor(nrow(missing_pool)*ncol(missing_pool)*(frac_missing)), replace = FALSE)
    
    
    missing_pool_save[missing_sample] <- NA
    
    saved_data <- apply(cbind(missing_pool_save, unname(abs_present)), 1, function(X){
      X[-which(is.na(X))]
    })
    
    for(j in 1:nrow(train_missing)){
      train_missing[j,-c(saved_data[[j]], p+1)] <- NA
    }
    
    return(list(Y = list(train_full = train_full, train_missing = train_missing, 
                         testing = testing, test_truth = test_truth), 
                params = list(mu = mu_use, Sig = S, vic = vic)))
    
}The function generates data for K categories, each with
a different mean and covariance structure for a single event
observation, drawn from a multivariate normal distribution. The
covariance of the K categories each has a random chance of
being a block-covariance matrix with blocks of random sizes, or a full
covariance matrix. The simulated values are then transformed from real
values to \((0,1)\) using the logistic
function. The random two block covariance is in effort to represent the
fact that some events will have correlated observations from space and
ground modalities, and others will have uncorrelated observations
between the modalities with random sets of discriminants
exhibiting correlations.
The arguments of data_gen are specified for the
experiment.
## Data Parameters
tst_missing <- 0.5
trn_missing <- 0.5
Ntrain <- 25
Ntest <- 100
Ntrain_missing <- 5 * Ntrain
K <- 3P is a vector of the values of p we want to
examine. For the experiment in a forthcoming publication,
P = c(4,6,8,10). In order to reduce the computation time of
this vignette, only values of 4 and 6 will be used.
The generated data will be used to train and test different
implementations of the Event Categorization Matrix (ECM) model. The
comparators are classical (C-ECM), Bayesian ECM (B-ECM) only trained on
events where all discriminants are available, B-ECM where the model is
trained on all available data including partial observations (M-B-ECM),
M-B-ECM where the loss matrix is changed such that the false negative
rate is reduced. All of these comparators focus on binary
categorization, simply detecting if a new observation belongs to a
pre-specified important category. The last comparator categorizes new
events into each of the K event categories used for
training with the B-ECM model (B-ECM Cat).
All methods use typicality indices as part of the decision framework.
For all methods, the significance level alphatilde is set
to 0.05.
We want to specify that for the B-ECM models, the weights of the
components in the mixture model are informed by the data. Alternatively,
one could change mixture_weights <- "equal" for all
components of the mixture model to have equal weight, possibly if the
frequency of the events in the training data is unrelated to what is
expected in practice.
Three separate loss matrices need to be specified for the experiment.
You may be here from the becm_decision() function. The
structure of a full K category loss matrix is
\[ \begin{array}{cc@{}cc} \phantom{XXX} & \phantom{XXX} & \phantom{XXX} & \mathrm{Action}\\ \phantom{XXX} & \phantom{XXX} & \phantom{XXX} & \begin{array}{cccc} a_1 \phantom{X} & a_2 \phantom{X} & \dots & a_K \end{array}\\ C = & \begin{array}{c} \mathrm{True}\\ \mathrm{Category} \end{array} \hspace{-1.5em}& \begin{array}{c} \tilde{z}_1 \\ \tilde{z}_2 \\ \vdots \\ \tilde{z}_K \end{array} \hspace{-2em}& \left[\begin{array}{c|c|c|c} C_{1,1} & C_{1,2} & \dots & C_{1, K}\\ \hline C_{2,1} & C_{2,2} & \dots & C_{2,K}\\ \hline \vdots & \vdots & \ddots & \vdots \\ \hline C_{K,1} & C_{K,2} & \dots & C_{K, K} \end{array}\right] \end{array}. \] Where the losses associated with the action of categorizing a \(\tilde{y}_{\tilde{p}}\) into one of the \(K\) training categories is specified in the columns, and the value of the latent categorization variable \(\tilde{z}_k\) for arbitrary category index \(k\) is specified in the rows. If \(\tilde{Z}^{\top} = [\tilde{z}_1 \dots \tilde{z}_K]\) is known, a \(1 \times K\) row vector of losses associated with each possible categorization action could be calculated as the matrix vector product \(L_{1 \times K} = \tilde{Z}^{\top} C\). However, the elements of \(\tilde{Z}^{\top}\) are unknown and modeled as random variables in the B-ECM framework. We can take the expectation of the loss for each action as \(\mathbb{E}[L]_{1 \times K} = \mathbb{E}[\tilde{Z}^{\top}]C\). The action that provides the minimum expected loss is probably the best bet for categorization.
The above structure for the loss matrix \(C\) is equivalent to what must be later
provided to the becm_decision() function for full \(K\) training categorization. The indices of
the rows and columns of \(C\) are the
same order as the indices of the categories listed as
names(bayes_pred$BayesECMfit$Y) for the
bayes_pred argument provided to
becm_decision(). The structure of \(C\) differs for binary categorization
differs in a slight, but important-to-note, way. For binary
categorization, we want to detect if \(\tilde{y}_{\tilde{p}}\) belongs to a
specific category stipulated using the vic (Very Important
Category) argument, and therefore the indexing of \(C\) for full \(K\) categorization does not port well to
applications for binary categorization. In this case, the first row and
column of \(C\) always correspond to
the category chosen as vic. The structure of \(C\), for vic indexed as \(k\), is therefore as is below.
\[ \begin{array}{cc@{}cc} \phantom{XX} & \phantom{XX} & \phantom{XX} & \mathrm{Action}\\ \phantom{XX} & \phantom{XX} & \phantom{XX} & \begin{array}{cc} \mathrm{a}_k & \mathrm{a}_{k^{-}} \end{array}\\ C = & \begin{array}{c} \mathrm{True}\\ \mathrm{Category} \end{array} \hspace{-0.5em}& \begin{array}{c} \tilde{z}_k \\ \sum_{\substack{i = 1 \\ i \neq k}}^K \tilde{z}_i \end{array} \hspace{-1em}& \left[\begin{array}{c|c} C_{1,1} & C_{1,2}\\ \hline C_{2,1} & C_{2,2} \end{array}\right] \end{array} \]
With the necessary structure of the loss matrices in mind, first we
specify the loss matrices for B-ECM and M-B-ECM, which will utilize 0-1
loss for binary categorization. This loss structure does not reward
correct categorizations, but punnishes mis-categorizations with a loss
of 1. Specifying the loss matrix accordingly as the C01
variable:
To test M-B-ECM with a higher loss for false negatives, the
Cfneg matrix variable is created. For loss matrix \(C\) specified for binary categorization,
the entry \(C_{1,2}\) corresponds to
the loss for choosing to categorize \(\tilde{y}_{\tilde{p}}\) into the group of
categories not specified by vic when the truth is \(\tilde{y}_{\tilde{p}}\)
is in the vic category. Such a
situation is the definition of a false negative, so changing \(C\) to reduce the false negative rate is
straightforward. Similarly, \(C_{2,1}\)
could be increased relative to \(C_{1,2}\) to reduce the false positive
rate, but we will stick with the goal of reducing false negatives for
this experiment. The loss for false negatives is increased by setting
Cfneg[1,2] <- 2.
Because K = 3, a \(3 \times
3\) loss matrix needs to be specified for M-B-ECM Cat. The loss
for any mis-categorization is specified to be equal for each
possibility. With all non-zero values equal, using a value of 1 is
sufficient.
The experiment is a Monte Carlo experiment. Random data sets are
repeatedly generated. We are interested in examining the typical
behavior of the models, as well as the variation in behavior. Each model
is fit to each data set and tested using a seperate testing set. The
accuracy, false negative rate, and false positive rate for all model
implementations are recorded for each data set generated within each
Monte Carlo iteration. To reduce the computation time of this vignette,
only 3 Monte Carlo iterations are generated for each total
discriminant size specified in P. To replicate the figure
and table generated for a forthcoming publication, instead set
iters <- 250.
Each model that can handle missing data utilizes Markov chain Monte
Carlo (MCMC) to impute possible values of the missing entries within the
training data. Then these values are integrated out when evaluating the
expected loss for each action. MCMC occurs multiple times within each
Monte Carlo iteration of the experiment, both concepts are not
intertwined here. Three parameter values need to be set for MCMC. The
two element vector BT first specifies the number of
Burn-in random samples of the missing data values that
are discgarded under the assumption that the Markov chain has not
converged within the first BT[1] draws. BT[2]
is the total number of MCMC iterations. After training models that can
handle missing data, the total number of draws from the distribution of
missing data entries will be BT[1] - BT[2] for each missing
element. To reduce computation time, the values of BT have
been reduced. The values BT < c(500,50500) were used to
compare the models in a forthcoming publication.
The predict.BayesECM() function can use all of the draws
of the missing data values obtained, or thin the samples. Specifying
thinning <- 5 means every fifth sample will be used,
which reduces the computation time for prediction as well as the
autocorrelation between draws. The default, thinning = 1,
utilizes all of the samples.
We will specify a few more variables. If you are running a version of
this experiment that is not fast to compute, we suggest setting
verb <- TRUE. To make this document, we set
To save the performance metrics for each Monte Carlo iteration, a
list exp_out is defined which saves the number of accurate
categorizations, the false positive rate, and the false negative rate at
each iteration for each value of p. General data, important
for making calculations later, is also saved to exp_out.
The structure of exp_out is built as the experiment moves
through the different values in P. If
verb == TRUE the time at the start of the experiment is
saved.
## Data structures for saving progress
cECM_recfp <- cECM_recfn <- bayes_rec <- cECM_rec <- matrix(NA, ncol = length(P), nrow = iters)
Nvic <- rep(0, times = length(P))
exp_out <- list()
method_template <- data.frame(matrix(NA, ncol = 3, nrow = iters))
names(method_template) <- c("accurate", "FN", "FP")
data_template <- data.frame(matrix(NA, ncol = 2, nrow = iters))
names(data_template) <- c("Ntest", "Nvic")
data_template$Ntest <- Ntest
p_template <- list(cECM = method_template, becm = method_template, 
                   mbecm = method_template, mbecm_Cfn = method_template, 
                   mbecm_cat = method_template, data = data_template)
bayes_rec <- cECM_rec <- matrix(NA, ncol = length(P), nrow = iters)
if(verb){
toc <- Sys.time()
}Then, the experiment iterates over the values of P and
then the number of Monte Carlo iterations for each setting of
p. Because the experiment is in a for loop, detailed
descriptions are in the comments.
## Iterates over the differing numbers of total discriminants set for the experiment.
for(p in P){
  
  ## Builds the list for saving the results using a template list
  exp_out[[p]] <- p_template
  
  ## Runs each model for `iters` number of independent data sets.
  for(i in 1:iters){
    ## The i^{th} run for p discriminants
    if(verb){    
      ## set the experimental parameter verb <- TRUE to print progress
      print(paste0("i = ", i, ", p = ", p, ", ", round(Sys.time() - toc, digits = 2), " ", units(Sys.time() - toc), " elapsed"))
    }
    
    ## Generate random data set
    
      Ylist <- data_gen(p = p, K = K, Ntest = Ntest, Ntrain = Ntrain, 
                     Ntrain_missing = Ntrain_missing, tst_missing = tst_missing, 
                    trn_missing = trn_missing)
      
      ## Saves the random data information to the environment
      
      train_full<- Ylist$Y$train_full
      train_missing <- Ylist$Y$train_missing
      testing <- Ylist$Y$testing
      test_truth <- Ylist$Y$test_truth
      ## Which category is the important one this time?
      vic <- Ylist$params$vic
    
    
    ## Save the true total number of `vic` events in the testing data to be used
    ## later for analyzing performance.
    
    exp_out[[p]][["data"]]$Nvic[i] <- sum(test_truth == as.character(vic))
    
    ## Fit the classical ECM model, apply the decision framework with the
    ## `cECM_decision()` function, then save the results
    
    cECM <- cECM(x = train_full, transform = TRUE, newdata = testing)
    cECM_out <-  apply(cECM_decision(pval = cECM, alphatilde = alphatilde,
                                 vic = as.character(vic), 
                                 cat_truth = test_truth)$events[,1:3] ,2, 
                       sum, na.rm = TRUE)
    exp_out[[p]][["cECM"]][i,] <- unname(cECM_out)
    
    ## Fit the B-ECM model, using only full p observations
    
    bayes_fit <- BayesECM(Y = train_full)
    
    ## Run the predict function on the testing set.
    ## If there were multiple testing sets, the same model fit could be used on
    ## each one without having to rerun the `BayesECM()` function.  This
    ## functionality is more important when using training data with missing
    ## entries.
    
    bayes_pred <- predict(bayes_fit, Ytilde = testing, 
                          mixture_weights = mixture_weights)
    
    ## The "becm_desision()" function applies the decision theoretic framework
    ## to the training and testing data.  For one training and one testing set,
    ## where the user wants to try different values of `alphatilde` and `C`, it is
    ## not necessary to rerun the `BayesECM()` function or the `predict()`
    ## function.
    
    becm_out <- becm_decision(bayes_pred = bayes_pred, alphatilde = alphatilde,
                                vic = as.character(vic), cat_truth = test_truth, 
                              pn = TRUE, C = C01)
    
    ## Summarize and save the data.
    
    becm_out <- apply(becm_out$results,2, sum, na.rm = TRUE)
    
    exp_out[[p]][["becm"]][i,] <- unname(becm_out)
    
    
    ## Fit and save the B-ECM model that includes missing data
    
    bayes_fit_missing <- BayesECM(Y = rbind(train_full, train_missing), BT = BT, 
                                  verb = verb)
    
    bayes_pred_missing <- predict(bayes_fit_missing, Ytilde = testing, 
                                  thinning = thinning, 
                                  mixture_weights = mixture_weights)
    
    missing_out <- becm_decision(bayes_pred = bayes_pred_missing, alphatilde = alphatilde,
                             vic = as.character(vic), cat_truth = test_truth, 
                             pn = TRUE, C = C01)
    mbecm_out <- apply(missing_out$results,2, sum, na.rm = TRUE)
    
    
    exp_out[[p]][["mbecm"]][i,] <- unname(mbecm_out)
    
    ## The rest of the B-ECM variants are different through decision theory,
    ## not the model fit.  All use partial observations for training.
    ## Note that the rej argument is supplied to becm_decision to reduce computation time
    
    ## Record the decision when the loss matrix is adjusted to target
    ## false negatives.
    
    Cfn_out <- becm_decision(bayes_pred = bayes_pred_missing, alphatilde = alphatilde,
                         vic = as.character(vic), cat_truth = test_truth, 
                         pn = TRUE, C = Cfneg, rej = missing_out$rej)
    becm_Cfn_out <- apply(Cfn_out$results,2, sum, na.rm = TRUE)
    
    
    exp_out[[p]][["mbecm_Cfn"]][i,] <- unname(becm_Cfn_out)
    
    ## Record the decisions when full class (K = 3) categorization is used
    ## instead of binary categorization
    
    cat_out <-  becm_decision(bayes_pred = bayes_pred_missing, alphatilde = alphatilde,
                          vic = as.character(vic), cat_truth = test_truth, 
                          pn = TRUE, C = Ccat, rej = missing_out$rej)
    becm_cat_out <- apply(cat_out$results,2, sum, na.rm = TRUE)
    
    
    exp_out[[p]][["mbecm_cat"]][i,] <- unname(becm_cat_out)
    
  }
  
}First a function for making the boxplot is defined. The output from
the experiment is the first argument. The user can subset the number of
discriminant compared with the P argument, and models
compared with the models argument. A different color
palette can be supplied if desired using cols. The
legend_text can also be altered, and should be if the user
does not want to plot all of the models compared.
ECM_boxplot <- function(exp_out, P = P, models = c("cECM", "becm", 
                                                   "mbecm", 
                                                   "mbecm_Cfn",
                                                   "mbecm_cat"),
                        cols = NULL,
                        legend_text =  c("C-ECM", "B-ECM", "M-B-ECM", 
                  bquote("M-B-ECM, " * C["1,2"] == 2), "M-B-ECM Cat"),
                  metric = "accurate"){
  
  if(metric == "accurate"){
    divisor <- function(exp_out, p){
      return(exp_out[[p]]$data$Ntest)
    }
    ylab <- "Model Accuracy"
  }else if(metric == "FN"){
    divisor <- function(exp_out,p){
      return(exp_out[[p]]$data$Nvic)
    }
    ylab <- "False Negative Rate"
  }else if(metric == "FP"){
    divisor <- function(exp_out, p){
      return(exp_out[[p]]$data$Ntest - exp_out[[p]]$data$Nvic)
    }
    ylab <- "False Positive Rate"
  }else{
    stop("Argument 'metric' must be one of the following case sensitive character strings: 'accurate', 'FN', or 'FP'.")
  }
   boxplotdf <- do.call("cbind", lapply(exp_out[[P[1]]][models], function(X, m = metric){
  X[[m]]
}))/divisor(exp_out, p = P[1])
  
   
  for(p in P[-1]){
  boxplotdf <- cbind(boxplotdf,do.call("cbind", lapply(exp_out[[p]][models], function(X, m = metric){
  X[[m]]
}))/divisor(exp_out, p = p))
  }
  
   boxplotdf <- boxplotdf
   
   if(max(boxplotdf) > 65){
     ylim <- c(min(boxplotdf), 1.1)
   }else{
    ylim <- range(boxplotdf) * c(0.9,1.2)
   }
   
if(is.null(cols)){
if(length(models) == 5){
pltcols <- hcl.colors(44, palette = "viridis", rev = TRUE)[c(3, 10, 20, 30, 37)]
}else if(length(models) > 10){
  warning("You should consider recoding this function with a different way to select the colors used for the plot.")
  pltcols <- hcl.colors(44, palette = "viridis", rev = TRUE)[seq(from = 2, to = 38, length.out = length(models))]
}else{
  pltcols <- hcl.colors(44, palette = "viridis", rev = TRUE)[seq(from = 2, to = 38, length.out = length(models))]
}
}else{
  if(length(cols) != length(models)){
    stop("If supplying colors, a vector the same length as the 'models' argument must be used.")
  }
}
   
opar <- par(no.readonly = TRUE)
on.exit(expr = suppressWarnings(par(opar)))
par(mar = c(4.25,3.85,1,0.5))
lmodels <- length(models)
graphics::boxplot(boxplotdf,  
                  at = (1:((lmodels + 1)*length(P)))[-seq(from = lmodels + 1, 
                                                          to = length(P)*(lmodels + 1), 
                                                          by = lmodels + 1)],
         xaxt = "n", yaxt = "n", ylim = ylim, 
        col = pltcols, xlab = "Number of Discriminants", ylab = ylab)
py <- pretty(boxplotdf)
graphics::axis(2, py)
graphics::axis(1, at =seq(from = 1, to = length(P)*(lmodels + 1), by = lmodels + 1) + 1, 
               labels =  paste0("p = ", P) )
graphics::legend("topleft", bty ="n", 
       legend = legend_text,  
       fill = pltcols, horiz = FALSE, ncol = 3, y.intersp = 1.25)
}Then, using the function with all of the data on overall accuracy collected.
If it is desirable to instead plot false negatives or false
positives, the argument metric can be set to
"FN" and "FP" respectively.
It is likely that without any changes to the code the plots above can
look quite different. Using the commented out values for the variables
P, iters, and BT will help, but
the computation time for the full experiment is a few days.
Alternatively, we have a hunch the settings iters <- 50,
BT <- c(500, 20500), and thinning <- 2
to provide a good compromise between computation time and Monte Carlo
variance. Additionally, larger values in the vector P have
a longer computation time, so the larger values can be removed or added
as seen fit.