Model benchmarks: speed versus forecast accuracy tradeoffs

Code
library(EpiNow2)
library(scoringutils)
library(data.table)
library(rstan)
library(cmdstanr)
library(ggplot2)
library(dplyr)
library(purrr)
library(lubridate)
library(scales)
library(posterior)

In using {EpiNow2}, users will often be faced with two decision points that will guide their choice of an appropriate model: (i) use case: retrospective vs real-time analysis, and (ii) limited computing resources. {EpiNow2} provides a range of customisations of the default model to suit these decision points.

The aim of this vignette is to show how these model customisations affect the speed and accuracy of the model. We will benchmark four (4) {EpiNow2} model options chosen to cover a range of use cases. We will benchmark the default model, the model with the 7-day random walk prior on \(R_t\), the non-mechanistic model, which has no explicit prior on \(R_t\), and the non-residual \(R_t\) model, which assumes a stationary prior on \(R_t\).

We will analyse the estimation and forecasting performance of these models when solved with MCMC sampling. {EpiNow2} also provides approximate sampling methods to solve these models (variational inference, pathfinder method, and laplace sampling). These algorithms are, however, not recommended for use in pipelines, so we will not emphasise their use. We will, however, highlight how the models perform when solved with these approximate sampling methods and provide an overview of when they may be appropriate.

For the benchmarking exercise, we will compare the models based on their runtimes and the accuracy of estimates on complete and partial data, taken at 3 time points of a simulated epidemic: (i) the growth phase, (ii) the peak, and (iii) the decline phase.

The benchmarking data

We will start by setting up the “true” dataset with known trajectories of \(R_t\) and infections using {EpiNow2}’s forecast_infections() function.

forecast_infections() requires a fitted estimates object from epinow() with output set to “fit”, the trajectory of the reproduction number, R, and the number of samples to simulate. So, we will set these up first.

To obtain the estimates object, we will run the epinow() function using real-world observed data and delay distributions to recover realistic parameter values. For the data, we will use the first \(60\) observations of the example_confirmed data set. We will use the example_generation_time for the generation time, and the example_incubation_period, and example_reporting_delay to specify and delays. These come with the package. For the rt model, we will use a 14-day random walk prior, with a mean of \(2\) and standard deviation of \(0.1\). As we only want to generate estimates, we will turn off forecasting by setting horizon = 0.

Throughout this vignette, several argument values, including the observation model options, the rt model prior will be maintained, so we will define them here.

Code
# Observation model options
obs <- obs_opts(
  scale = Normal(0.1, 0.025),
  return_likelihood = TRUE
)
# Rt prior
rt_prior_default <- Normal(2, 0.1)
# Number of cores
options(mc.cores = 6)

Now, we can generate the estimates object.

Code
estimates <- epinow(
  data = example_confirmed[1:60],
  generation_time = generation_time_opts(example_generation_time),
  delays = delay_opts(example_incubation_period + example_reporting_delay),
  rt = rt_opts(prior = rt_prior_default, rw = 14),
  gp = NULL,
  obs = obs,
  forecast = forecast_opts(horizon = 0), # no forecasting
  output = "fit"
)
#> WARN [2025-01-29 15:07:06] epinow: The `filter_leading_zeros` argument of `estimate_infections()` is deprecated as of EpiNow2 1.7.0. - 
#> WARN [2025-01-29 15:07:06] epinow: The `zero_threshold` argument of `estimate_infections()` is deprecated as of EpiNow2 1.7.0. - 
#> WARN [2025-01-29 15:07:06] epinow: partial match of 'length' to 'lengths' - $, consecutive, length

For the R data, we will set up an arbitrary trajectory and add some Gaussian noise.

Code
# Arbitrary reproduction number trajectory
R <- c(
  rep(1.50, 20),
  rep(1.25, 10),
  rep(1, 10),
  rep(0.5, 10),
  rep(1, 10),
  1 + 0.04 * 1:20,
  rep(1.4, 5),
  1.4 - 0.02 * 1:20,
  rep(1.4, 10),
  rep(0.8, 50),
  0.8 + 0.02 * 1:20
)
# Add Gaussian noise
R_noisy <- R * rnorm(length(R), 1.1, 0.05)
# Plot
true_rt <- ggplot(data = data.frame(R = R_noisy)) +
  geom_line(aes(x = seq_along(R_noisy), y = R)) +
  labs(x = "Time")

plot(true_rt)
plot of chunk R-data
plot of chunk R-data

Let’s proceed to simulate the true infections and \(R_t\) data by sampling from \(10\) posterior samples.

Code
# Forecast infections and the trajectory of Rt
forecast <- forecast_infections(
  estimates$estimates,
  R = R_noisy,
  samples = 10
)
#> Warning in estimates$args$h: partial match of 'h' to 'horizon'

Let’s now extract the true data for benchmarking: - R_true: the median of the simulated \(R_t\) values, - infections_true: the infections by date of infection, and - reported_cases_true: the reported cases by date of report.

Code
R_true <- forecast$summarised[variable == "R"]$median

# Get the posterior samples from which to extract the simulated infections and reported cases
posterior_sample <- forecast$samples[sample == 1]

# Extract the simulated infections
infections_true <- posterior_sample[variable == "infections"]$value

# Extract the simulated reported cases and rename the "value" column to "confirm" (to match EpiNow2 requirements)
reported_cases_true <- posterior_sample[
  variable == "reported_cases", .(date, confirm = value)
]

Let’s see what the cases plot looks like

Code
cases_traj <- ggplot(data = reported_cases_true) +
  geom_line(aes(x = date, y = confirm)) +
  scale_y_continuous(label = scales::label_comma()) +
  scale_x_date(date_labels = "%b %d", date_breaks = "1 month") +
  labs(y = "Reported cases", x = "Date")
#> Warning in scale_y_continuous(label = scales::label_comma()): partial argument
#> match of 'label' to 'labels'
cases_traj
plot of chunk plot-cases
plot of chunk plot-cases

We will now proceed to define and run the different model options.

Models

Descriptions

Below we describe each model.

Model options
model model_basename description
default_mcmc default Default model (non-stationary prior on \(R_t\)); fitting with mcmc
default_vb default Default model (non-stationary prior on \(R_t\)); fitting with variational bayes
default_pathfinder default Default model (non-stationary prior on \(R_t\)); fitting with pathfinder algorithm
default_laplace default Default model (non-stationary prior on \(R_t\)); fitting with laplace approximation
non_mechanistic_mcmc non_mechanistic no mechanistic prior on \(R_t\); fitting with mcmc
non_mechanistic_vb non_mechanistic no mechanistic prior on \(R_t\); fitting with variational bayes
non_mechanistic_pathfinder non_mechanistic no mechanistic prior on \(R_t\); fitting with pathfinder algorithm
non_mechanistic_laplace non_mechanistic no mechanistic prior on \(R_t\); fitting with laplace approximation
rw7_mcmc rw7 7-day random walk prior on \(R_t\); fitting with mcmc
rw7_vb rw7 7-day random walk prior on \(R_t\); fitting with variational bayes
rw7_pathfinder rw7 7-day random walk prior on \(R_t\); fitting with pathfinder algorithm
rw7_laplace rw7 7-day random walk prior on \(R_t\); fitting with laplace approximation
non_residual_mcmc non_residual Stationary prior on \(R_t\); fitting with mcmc
non_residual_vb non_residual Stationary prior on \(R_t\); fitting with variational bayes
non_residual_pathfinder non_residual Stationary prior on \(R_t\); fitting with pathfinder algorithm
non_residual_laplace non_residual Stationary prior on \(R_t\); fitting with laplace algorithm

These are the components of each model.

Model components
model rt_gp_prior fitting package
default_mcmc non_stationary mcmc rstan
default_vb non_stationary variational_bayes rstan
default_pathfinder non_stationary pathfinder cmdstanr
default_laplace non_stationary laplace cmdstanr
non_mechanistic_mcmc none mcmc rstan
non_mechanistic_vb none variational_bayes rstan
non_mechanistic_pathfinder none pathfinder cmdstanr
non_mechanistic_laplace none laplace cmdstanr
rw7_mcmc non_stationary mcmc rstan
rw7_vb non_stationary variational_bayes rstan
rw7_pathfinder non_stationary pathfinder cmdstanr
rw7_laplace non_stationary laplace cmdstanr
non_residual_mcmc stationary mcmc rstan
non_residual_vb stationary variational_bayes rstan
non_residual_pathfinder stationary pathfinder rstan
non_residual_laplace stationary laplace cmdstanr

Configurations

We will now define the {EpiNow2} configurations for each model, which are modifications of the default model.

Code
model_configs <- list(
  # The default model with MCMC fitting
  default_mcmc = list(
    rt = rt_opts()
  ),
  # The default model with variational bayes fitting
  default_vb = list(
    rt = rt_opts(),
    stan = stan_opts(method = "vb", backend = "rstan")
  ),
  # The default model with pathfinder fitting
  default_pathfinder = list(
    rt = rt_opts(),
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr")
  ),
  # The default model with laplace fitting
  default_laplace = list(
    rt = rt_opts(),
    stan = stan_opts(method = "laplace", backend = "cmdstanr")
  ),
  # The non-mechanistic model with MCMC fitting
  non_mechanistic_mcmc = list(
    rt = NULL
  ),
  # The non-mechanistic model with variational bayes fitting
  non_mechanistic_vb = list(
    rt = NULL,
    stan = stan_opts(method = "vb", backend = "rstan")
  ),
  # The non-mechanistic model with pathfinder fitting
  non_mechanistic_pathfinder = list(
    rt = NULL,
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr")
  ),
  # The non-mechanistic model with laplace fitting
  non_mechanistic_laplace = list(
    rt = NULL,
    stan = stan_opts(method = "laplace", backend = "cmdstanr")
  ),
  # The 7-day RW Rt model with MCMC fitting
  rw7_mcmc = list(
    rt = rt_opts(
      prior = rt_prior_default,
      rw = 7
    )
  ),
  # The 7-day RW Rt model with variational bayes fitting
  rw7_vb = list(
    rt = rt_opts(
      prior = rt_prior_default,
      rw = 7
    ),
    stan = stan_opts(method = "vb", backend = "rstan")
  ),
  # The 7-day RW Rt model with pathfinder fitting
  rw7_pathfinder = list(
    rt = rt_opts(
      prior = rt_prior_default,
      rw = 7
    ),
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr")
  ),
  # The 7-day RW Rt model with laplace fitting
  rw7_laplace = list(
    rt = rt_opts(
      prior = rt_prior_default,
      rw = 7
    ),
    stan = stan_opts(method = "laplace", backend = "cmdstanr")
  ),
  # The non_residual model with MCMC fitting
  non_residual_mcmc = list(
    rt = rt_opts(
      prior = rt_prior_default,
      gp_on = "R0"
    )
  ),
  # The non_residual model with variational bayes fitting
  non_residual_vb = list(
    rt = rt_opts(
      prior = rt_prior_default,
      gp_on = "R0"
    ),
    stan = stan_opts(method = "vb", backend = "rstan")
  ),
  # The non_residual model with pathfinder fitting
  non_residual_pathfinder = list(
    rt = rt_opts(
      prior = rt_prior_default,
      gp_on = "R0"
    ),
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr")
  ),
  # The non_residual model with laplace fitting
  non_residual_laplace = list(
    rt = rt_opts(
      prior = rt_prior_default,
      gp_on = "R0"
    ),
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr")
  )
)

Running the models

All the models will share the configuration for the generation time, incubation period, reporting delay, and the forecast horizon, so we will define them once and pass them to the models.

Code
# Generation time
generation_time <- Gamma(
  shape = Normal(1.3, 0.3),
  rate = Normal(0.37, 0.09),
  max = 14
)

# Incubation period
incubation_period <- LogNormal(
  meanlog = Normal(1.6, 0.05),
  sdlog = Normal(0.5, 0.05),
  max = 14
)

# Reporting delay
reporting_delay <- LogNormal(
  meanlog = 0.5,
  sdlog = 0.5,
  max = 10
)

# Combine the incubation period and reporting delay into one delay
delay <- incubation_period + reporting_delay

# 7-day forecast window
horizon <- 7

# Combine the shared model inputs into a list for use across all the models
model_inputs <- list(
  generation_time = generation_time_opts(generation_time),
  delays = delay_opts(delay),
  obs = obs,
  forecast = forecast_opts(horizon = horizon),
  verbose = FALSE
)

Additionally, from the benchmarking data, we choose the following dates to represent the periods of growth, peak, and decline.

Code
snapshot_dates <- as.Date(c("2020-05-15", "2020-06-21", "2020-06-28"))

Using the chosen dates, let’s create the data snapshots for fitting the models.

Code
data_snaps <- lapply(
  snapshot_dates,
  function(snap_date) {
    reported_cases_true[date <= snap_date]
  }
)
names(data_snaps) <- snapshot_dates

Now, we’re ready to run the models. We will use the epinow() function and return useful outputs like the timing of model runs. We obtain forecasts for the data excluding the forecast horizon and then compare the forecasts to the data including the horizon in the evaluations. This is often called an out-of-sample evaluation.

Code
# Create a version of epinow() that works like base::try() and works even if some models fail.
safe_epinow <- purrr::safely(epinow)
# Run the models over the different dates
results <- lapply(
  data_snaps, function(data) {
    lapply(
      model_configs,
      function(model) {
        # Use subset of the data
        data_to_fit <- data[1:(.N - model_inputs$forecast$horizon)] # exclude the forecast horizon
        model_inputs <- c(model_inputs, data = list(data_to_fit))
        do.call(
          safe_epinow,
          c(
            model_inputs,
            model
          )
        )
      }
    )
  }
)

Evaluating model performance

We’ll begin by setting up the following functions:

Code
# Function to extract the "timing", "Rt", "infections", and "reports" variables from an
# epinow() run. It expects a model run, x, which contains a "results" or "error" component.
# If all went well, "error" should be NULL.
extract_results <- function(x, variable) {
  stopifnot(
    "variable must be one of c(\"timing\", \"R\", \"infections\", \"reports\")" =
      variable %in% c("timing", "R", "infections", "reports")
  )
  # Return NA if there's an error
  if (!is.null(x$error)) {
    return(NA)
  }

  if (variable == "timing") {
    return(round(as.duration(x$result$timing), 1))
  } else {
    obj <- x$result$estimates$fit
  }

  # Extracting "Rt", "infections", and "reports" is different based on the object's class and
  # other settings
  if (inherits(obj, "stanfit")) {
    # Depending on rt_opts(use_rt = TRUE/FALSE), R shows up as R or gen_R
    if (variable == "R") {
      # The non-mechanistic model returns "gen_R" where as the others sample "R".
      if ("R[1]" %in% names(obj)) {
        return(extract(obj, "R")$R)
      } else {
        return(extract(obj, "gen_R")$gen_R)
      }
    } else {
      return(extract(obj, variable)[[variable]])
    }
  } else {
    obj_mat <- as_draws_matrix(obj)
    # Extracting R depends on the value of rt_opts(use_rt = )
    if (variable == "R") {
      if ("R[1]" %in% variables(obj_mat)) {
          return(subset_draws(obj_mat, "R"))
      } else {
        return(subset_draws(obj_mat, "gen_R"))
      }
    } else {
        return(subset_draws(obj_mat, variable))
      }
    }
}

# Wrapper for extracting the results and making them into a data.table
get_model_results <- function(results_by_snapshot, variable) {
  # Get model results list
  purrr::map_depth(results_by_snapshot, 2, extract_results, variable)
}
# Function to categorise fitting methods into mcmc and approximate
add_fit_type <- function(data){
  data[, fit_type := ifelse(fitting == "mcmc", "mcmc", "approximate")]
  data[]
}

# Function to convert all columns to factor except the specified cols in `except` (which should be DateTime)
make_cols_factors <- function(data, except){
  # Make all columns except timing a factor
  data[
    ,
    (setdiff(names(data), except)) :=
      lapply(.SD, as.factor),
    .SDcols = setdiff(names(data), except)
  ]
  data[]
}

# Add factor levels
add_epidemic_phase_levels <- function(data){
  # Add epidemic phase as a factor
  data[, epidemic_phase := factor(epidemic_phase, levels = c("growth", "peak", "decline"))]
  data[]
}

# Function to calculate the CRPS of log-transformed values
calc_crps <- function(estimates, truth) {
  # if the object is not a matrix, then it's an NA (failed run)
  if (!inherits(estimates, c("matrix"))) return(rep(NA_real_, length(truth)))
  # Assumes that the estimates object is structured with the samples as rows
  shortest_obs_length <- min(ncol(estimates), length(truth))
  reduced_truth <- tail(truth, shortest_obs_length)
  estimates_transposed <- t(estimates) # transpose to have samples as columns
  reduced_estimates <- tail(estimates_transposed, shortest_obs_length)
  return(
    crps_sample(
      log10(reduced_truth),
      log10(reduced_estimates)
    )
  )
}

# Function to calculate CRPS estimates and flatten the nested list of model values
process_crps <- function(results, variable, truth) {
  # Extract values
  results_by_snapshot <- get_model_results(results, variable = variable)

  # Apply function to calculate CRPS
  crps_by_snapshot <- purrr::map_depth(
    results_by_snapshot,
    2,
    ~ calc_crps(estimates = ., truth = truth)
  )

  # Add dates column based on snapshot length
  crps_with_dates <- purrr::map_depth(
    crps_by_snapshot,
    2,
    ~ data.table(crps = .x)[,
      date := min(reported_cases_true$date) + 0: (.N - 1)
    ]
  )

  # Flatten the results into one dt
  crps_flat <- lapply(
    crps_with_dates,
    function(snapshot_results) {
     rbindlist(snapshot_results, idcol = "model")
    }) |>
    rbindlist(idcol = "snapshot_date")

  # Replace the snapshot dates with their description
  crps_flat[, epidemic_phase := names(snapshot_date_labels)[
    match(snapshot_date, snapshot_date_labels)
  ]]

  return(crps_flat)
}

# Add model details (components and descriptions useful for summarising)
add_model_details <- function(crps_by_model){
  # Add model configurations for facetting
  with_model_components <- merge.data.table(
    crps_by_model,
    model_components,
    by = "model"
  )

  # Add model descriptions
  crps_dt <- merge.data.table(
    with_model_components,
    model_descriptions,
    by = "model"
  )
  crps_dt[]
}

# Define a function to calculate total CRPS stratified by the "by" vector
calculate_total_crps <- function(data, by) {
  data[, .(total_crps = sum(crps, na.rm = TRUE)), by = by]
}

# Define a function to plot total CRPS
plot_total_crps <- function(data, title) {
  plot <- ggplot(data = data) +
    geom_point(
      aes(
        x = epidemic_phase,
        y = total_crps,
        shape = model
      ),
      size = 2.5
    ) +
    labs(
      x = "Epidemic phase",
      y = "Total CRPS",
      title = title,
      caption = "Where a model is not shown, it means it failed to run"
    ) +
  theme(plot.title = element_text(size = 18),
        strip.text = element_text(size = 13),
        axis.title = element_text(size = 13),
        axis.text = element_text(size = 11)
        )
  return(plot)
}

# Define a function to plot CRPS over time
plot_crps_over_time <- function(data, title) {
  plot <- ggplot(data = data[!is.na(crps)]) + # remove failed models
    geom_line(
      aes(x = date,
          y = crps,
          color = model
          ),
      linewidth = 1.1
    ) +
    labs(
      x = "Time",
      y = "CRPS",
      title = title,
      caption = "Where a model is not shown, it means it failed to run"
    ) +
    ggplot2::theme_minimal() +
    guides(color = guide_legend(title = "Model type")) +
    theme(legend.position = "bottom",
          plot.title = element_text(size = 18),
          strip.text = element_text(size = 13),
          axis.title = element_text(size = 13),
          axis.text = element_text(size = 11)
    )
  return(plot)
}

Run times (computational resources)

Let’s now see how long each model took to run. Here, we show the run times by fitting method because they approximate methods are significantly faster. Note the difference in scale in the y-axis.

Code
# Extract the run times and reshape to dt
runtimes_by_snapshot <- get_model_results(results, "timing")

# Flatten the results
runtimes_dt <- lapply(runtimes_by_snapshot, function(x) as.data.table(x)) |>
  rbindlist(idcol = "snapshot_date", ignore.attr = TRUE)

# Reshape
runtimes_dt_long <- melt(
  runtimes_dt,
  id.vars = "snapshot_date",    # Column to keep as an identifier
  measure.vars = model_descriptions$model,  # Dynamically select model columns by pattern
  variable.name = "model",      # Name for the 'model' column
  value.name = "timing"         # Name for the 'timing' column
)
#> Warning in melt.data.table(runtimes_dt, id.vars = "snapshot_date", measure.vars
#> = model_descriptions$model, : 'measure.vars' [default_mcmc, default_vb,
#> default_pathfinder, default_laplace, ...] are not all of the same type. By
#> order of hierarchy, the molten data value column will be of type 'double'. All
#> measure variables not of type 'double' will be coerced too. Check DETAILS in
#> ?melt.data.table for more on coercion.

# Add model configurations
runtimes_dt_detailed <- merge(
  runtimes_dt_long,
  model_components,
  by = "model"
)

# snapshot dates dictionary
snapshot_date_labels <- c(growth = "2020-05-15", peak = "2020-06-21", decline = "2020-06-28")

# Replace snapshot_date based on the dictionary
runtimes_dt_detailed[, epidemic_phase := names(snapshot_date_labels)[match(snapshot_date, snapshot_date_labels)]]

# Add model descriptions
runtimes_dt_detailed <- merge(
  runtimes_dt_detailed,
  model_descriptions,
  by = "model"
)

# Re-categorise the factor levels
runtimes_dt_detailed <- add_fit_type(runtimes_dt_detailed)

# Make all columns except timing a factor
runtimes_dt_detailed <- make_cols_factors(runtimes_dt_detailed, except = "timing")

# Add epidemic_phase factor levels to c("growth", "peak", "decline"))
runtimes_dt_detailed <- add_epidemic_phase_levels(runtimes_dt_detailed)

# Plot the timing
timing_plot <- ggplot(data = runtimes_dt_detailed) +
  geom_point(aes(x = epidemic_phase,
                 y = timing,
                 shape = model_basename,
                 color = fitting
                 ),
             position = position_dodge(width = 0.4),
             size = 3,
             stroke = 1.2
  ) +
  labs(x = "Epidemic phase",
       y = "Runtime (secs)",
       shape = "Model",
       fill = "Fitting method",
       title = "Runtimes per model and fitting method",
       caption = "Where a model is not shown, it means it failed to run"
  ) +
  theme_minimal() +
  scale_color_brewer(palette = "Dark2") +
  scale_shape_manual(values = c(1, 0, 6, 8)) +
  facet_wrap(~fit_type, scales = "free_y", nrow = 2, strip.position = "top") +
  theme(plot.title = element_text(size = 18),
        strip.text = element_text(size = 13),
        axis.title = element_text(size = 13),
        axis.text = element_text(size = 11)
        )
timing_plot
#> Warning: Removed 7 rows containing missing values or values outside the scale range
#> (`geom_point()`).
plot of chunk process-runtimes
plot of chunk process-runtimes

We can see that across the board, the non-mechanistic model and non-residuals models were the fastest whereas the default model was among the slowest models for all data scenarios.

Evaluating model performance

Now, we will evaluate the performance of the models using the continuous ranked probability score (CRPS). The CRPS is a proper scoring rule that measures the accuracy of probabilistic forecasts. When comparing models, the smaller the CRPS, the better.

We will focus on evaluating the results of fitting with MCMC. We will also show the results for the approximate sampling methods but will not delve deeply into them as we do not recommended for full inference due to their experimental nature currently, which makes them unstable. In a latter section, we will provide ideas on how the approximate methods can be used in {EpiNow2} workflows.

Code
# Process CRPS for Rt
rt_crps <- process_crps(results, "R", R)
rt_crps_full <- add_model_details(rt_crps)

# Process CRPS for infections
infections_crps <- process_crps(results, "infections", infections_true)
infections_crps_full <- add_model_details(infections_crps)

As part of post-processing, we will add a “type” column to the output of the estimates to indicate which subsets are based on partial and complete data. We will get the “type” column from the default model (the same values across model outputs) and append them to the rest of the model results.

Code
# Get date and fit type from the default model (the same across model outputs)
results_by_model <- list_transpose(results)

estimate_types <- lapply(
  results_by_model$non_mechanistic_mcmc,
  function(results_by_snapshot) {
    results_by_snapshot$result$estimates$summarised[
      variable == "reported_cases"][
        ,
        c("date", "type")
      ]
  }
) |>
  rbindlist(idcol = "snapshot_date")

# add the "estimate_type" column.
rt_crps_dt <- merge(
  rt_crps_full,
  estimate_types,
  by = c("date", "snapshot_date")
)

# Add the "estimate_type" column
infections_crps_dt <- merge(
  infections_crps_full,
  estimate_types,
  by = c("date", "snapshot_date")
)

# Re-categorise fit_type column and convert to factor

# Rt
rt_crps_dt <- add_fit_type(rt_crps_dt)
rt_crps_dt <- make_cols_factors(rt_crps_dt, except = c("date", "crps"))
rt_crps_dt <- add_epidemic_phase_levels(rt_crps_dt)

# Simplify the model name for downstream use
rt_crps_dt_final <- rt_crps_dt[, model := gsub("_[^_]*$", "", model)]

# Infections
infections_crps_dt <- add_fit_type(infections_crps_dt)
infections_crps_dt <- make_cols_factors(infections_crps_dt, except = c("date", "crps"))
infections_crps_dt <- add_epidemic_phase_levels(infections_crps_dt)

# Simplify the model name for downstream use
infections_crps_dt_final <- infections_crps_dt[, model := gsub("_[^_]*$", "", model)]

Model performance over time

Let’s see how the \(R_t\) and infections CRPS changed over time using the function plot_crps_over_time(). We’ll start with the models fitted with MCMC.

Code
# Plot CRPS over time for Rt
rt_crps_mcmc <- rt_crps_dt_final[fitting == "mcmc"]
rt_crps_mcmc_plot <- plot_crps_over_time(rt_crps_mcmc, "Time-varying model performance in estimating Rt (mcmc)")
rt_crps_mcmc_plot  +
    facet_wrap(~epidemic_phase, ncol = 1)
plot of chunk plot-rt-crps-mcmc
plot of chunk plot-rt-crps-mcmc
Code
# Plot CRPS over time for infections
infections_crps_mcmc_dt <- infections_crps_dt_final[fitting == "mcmc"]
infections_crps_mcmc_plot <- plot_crps_over_time(infections_crps_mcmc_dt, "Time-varying model performance in estimating infections (mcmc)")
infections_crps_mcmc_plot +
    facet_wrap(~epidemic_phase, ncol = 1)
plot of chunk plot-infections-crps-mcmc
plot of chunk plot-infections-crps-mcmc

Overall model performance

Let’s calculate the overall/aggregated performance of the models in terms of the total CRPS for \(R_t\) and infections for downstream visualisations.

Code
# Rt
rt_total_crps <- calculate_total_crps(rt_crps_dt_final, by = c("model", "epidemic_phase", "type", "fitting"))

# infections
infections_total_crps <- calculate_total_crps(infections_crps_dt_final, by = c("model", "epidemic_phase", "type", "fitting"))


# We are not interested in the "estimates", just "estimaes based on partial data" and "forecasts"

rt_total_crps <- rt_total_crps[type != "estimate"]
infections_total_crps <- infections_total_crps[type != "estimate"]

Now, let’s visualise the results for MCMC fitting.

Code
# Calculate
rt_total_crps_mcmc <- rt_total_crps[fitting == "mcmc"]

# Process data and plot
rt_total_crps_mcmc_plot <- plot_total_crps(rt_total_crps_mcmc, "Aggregated model performance in estimating Rt (mcmc)")
rt_total_crps_mcmc_plot  +
  scale_shape_manual(values = c(0, 1, 17, 8)) +
  facet_wrap(~type)
plot of chunk crps-plotting-rt-total
plot of chunk crps-plotting-rt-total
Code
# Calculate
infections_total_crps_mcmc <- infections_total_crps[fitting == "mcmc"]

# Process and plot
infections_total_crps_plot <- plot_total_crps(infections_total_crps_mcmc, "Aggregated model performance in estimating infections (mcmc)")
infections_total_crps_plot +
  scale_shape_manual(values = c(0, 1, 17, 8)) +
    facet_wrap(~type)
plot of chunk crps-plotting-infections-total
plot of chunk crps-plotting-infections-total

Performance of approximate methods

We’ll now show the performance of the approximate methods. Note that we do not recommend using them in real-world inference and analytics pipelines. We provide alternative use cases in the following sections.

Let’s first look at the time varying \(R_t\) and infections estimates.

Code
# Plot CRPS over time for Rt
rt_crps_approx <- rt_crps_dt_final[fitting != "mcmc"]

# Process plot data and plot
rt_tv_crps_plot_approx <- plot_crps_over_time(rt_crps_approx, "Time-varying model performance in estimating Rt (approximate methods)")
rt_tv_crps_plot_approx  +
    facet_wrap(fitting~epidemic_phase)
plot of chunk plot-rt-tv-crps-approx
plot of chunk plot-rt-tv-crps-approx
Code
# Plot CRPS over time for Rt
infections_crps_approx <- infections_crps_dt_final[fitting != "mcmc"]

# Process plot data and plot
infections_tv_crps_plot_approx <- plot_crps_over_time(infections_crps_approx, "Time-varying model performance in estimating infections (approximate methods)")
infections_tv_crps_plot_approx  +
    facet_wrap(fitting~epidemic_phase, scales = "free_y") +
  labs(caption = "Where a model is not shown, it means it failed to run")
plot of chunk plot-infections-tv-crps-approx
plot of chunk plot-infections-tv-crps-approx

Now, let’s look at the total CRPS for \(R_t\) and infections.

Code
# Calculate total CRPS
rt_total_crps_approx <- rt_total_crps[fitting != "mcmc"]

# Process plot data and plot
rt_total_crps_approx_plot <- plot_total_crps(rt_total_crps_approx, "Aggregated model performance in estimating Rt (approximate methods)")
rt_total_crps_approx_plot +
  scale_shape_manual(values = c(0, 1, 17, 8)) +
  facet_wrap(~ type + fitting) +
  labs(caption = "Where a model is not shown, it means it failed to run")
plot of chunk plot-rt-total-crps-approx
plot of chunk plot-rt-total-crps-approx
Code
# Calculate total CRPS
infections_total_crps_approx <- infections_total_crps[fitting != "mcmc"]

# Process plot data and plot
infections_total_crps_approx_plot <- plot_total_crps(infections_total_crps_approx, "Aggregated model performance in estimating infections (approximate methods)")
infections_total_crps_approx_plot +
  scale_shape_manual(values = c(0, 1, 17, 8)) +
  facet_wrap( ~ type + fitting) +
  labs(caption = "Where a model is not shown, it means it failed to run")
plot of chunk plot-infections-total-crps-approx
plot of chunk plot-infections-total-crps-approx

Summary of results

Overall, the non-mechanistic model was always the fastest and had the best performance in estimating \(R_t\). The default model was among the slowest models in most cases but often had the best time-varying and aggregated performance in estimating infections. The non-residual and 7-day random walk models showed mixed results. This suggests a trade-off between run times/speed and estimation/forecasting performance, here measured with the CRPS. These results show that choosing an appropriate model for a task requires carefully considering the use case and appropriate trade-offs. Below are a few considerations.

Considerations for choosing an appropriate model

Model types (Semi-mechanistic vs non-mechanistic)

Estimation in {EpiNow2} using the semi-mechanistic approaches (putting a prior on \(R_t\)) is often much slower than the non-mechanistic approach. The mechanistic model is slower because it models aspects of the processes and mechanisms that drive \(R_t\) estimates using the renewal equation. The non-mechanistic model, on the other hand, runs much faster but does not use the renewal equation to generate infections. Because of this none of the options defining the behaviour of the reproduction number are available in this case, limiting its flexibility.

It’s worth noting that the non-mechanistic model in {EpiNow2} is equivalent to that used in the {EpiEstim} R package as they both estimate \(R_t\) from case time series and the generation interval distribution.

Fitting methods (Exact vs approximate)

The default sampling method, set through stan_opts(), performs MCMC sampling using {rstan}. The MCMC sampling method is accurate but is often slow. The Laplace, pathfinder, and variational inference methods are faster because they are approximate (See, for example, a detailed explanation for automatic variational inference in Stan).

In {EpiNow2}, you can use variational inference with an {rstan} or {cmdstanr} backend but you must install the latter to access its functionalities. Additionally, {EpiNow2} supports using the Laplace and Pathfinder approximate samplers through the {cmdstanr} R package but these two methods are currently experimental in {cmdstanr} and have not been well tested. The approximate methods may not be as reliable as the default MCMC sampling method and we do not recommend using them in real-world inference.

The approximate methods can be used in various ways. First, you can initialise the MCMC sampling algorithm with the fit object returned by methods such as pathfinder. More details can be found in the original pathfinder paper. This approach speeds up the initialisation phase of the MCMC algorithm. Second, the approximate methods are also great for prototyping. For example, if you are testing out a pipeline setup, it might be more practical to switch to a method like variational bayes and only use MCMC when the pipeline is up and running.

Smoothness/granularity of estimates

The random walk method reduces smoothness/granularity of the estimates, compared to the other methods.

Caveats of this exercise

We generated the data using an arbitrary R trajectory. This represents only one of many data scenarios that the models can be benchmarked against. The data used here represents abrupt rises and falls and could favour one model type or solver over another.

The run times measured here use a crude method that compares the start and end times of each simulation. It only measures the time taken for one model run and may not be accurate. For more accurate run time measurements, we recommend using a more sophisticated approach like those provided by packages like {bench} and {microbenchmark}.

Lastly, we used 6 cores for the simulations and so using more or fewer cores might change the run time results. We, however, expect the relative rankings to be the same or similar. To speed up the model runs, we recommend checking the number of cores available on your machine using parallel::detectCores() and passing a high enough number of cores to mc.cores through the options() function (See the benchmarking data setup chunk above for an example of how to do this).