This vignette documents the grid_scan()
function
provided in this package which allows high-performance repeated
simulation runs over a grid of parameters and/or initial states. Here,
we take the classical LG circadian clock model and demonstrate the
following:
Workflow shown here shall be generally useful for interpretation of wet-lab experiment results as study of clock mutants typically ends up with hypotheses where certain parameter(s) of the circadian clock is/are affected.
grid_scan()
is generally useful for cases where you have
an Odin model too; this function is decoupled from specifics of
circadian clock models.
library(clockSim)
library(matrixStats)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:matrixStats':
#>
#> count
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
Under wildtype condition the circadian clock is a limit cycle oscillator with ~24hr period. The LG model reproduces such behavior. To see this we run simulation with default setting:
model_gen <- getOdinGen()$continuous_LG
model <- model_gen$new()
sim_hours <- seq(from = 0, to = 2400, by = 1)
res <- model$run(sim_hours) |> as.data.frame()
res$time <- res$t
plot(plot_phase(res, M_T, C_N))
print(compute_period(res$M_T |> tail(n = 240), method = "lomb"))
#> period power snr p.value
#> 2.390000e+01 8.558178e-01 NA 2.142988e-100
As shown above:
While promising, however, the LG model is a complex 10-state system. We next explore different properties of the system:
10-variable makes it impossible to do a extensive grid on initial states. To see how fast our simulation runs:
With each simulation at ~10ms, we should be able to do 1) a very coarse grid on all 10 states with ~3 for each state, or 2) an extensive grid on selected states (say TIM/PER RNA):
\[ 3^{10}*10\mbox{ms}\sim 250^{2}*10\mbox{ms}\sim600\mbox{sec} \]
With the CRAN build time limitation, in this vignette we will not run the ~10min grid above but instead deal with a smaller grid on TIM/PER RNA.
First, we define the grid. We compute summary of all states in the baseline simulation and use the min/mean/max to define the grid. For each state, we scan N multiples of mean+N*spread of the baseline simulation, where spread = (max-min)/2 and N = 2,3, …, Nmax. For the CRAN grid Nmax=5.
# Compute summary
summary <- res |>
select(-t, -time) |>
apply(2, summary)
# Only keep min/mean/max
summary <- summary[c(1,4,6),]
# Add on mean+N*spread, N=2,3,...,N_max
N_max <- 3 # For larger scan increase this. CRAN=3
get_multiples <- function(s, k) {
# Extract components
min <- s[1, ]
mean <- s[2, ]
delta <- s[3, ] - min
# Create new rows using vectorized operations
multiples <- outer(k, delta) |> sweep(2, mean, "+")
attr(multiples, "original") <- s
# Return
multiples
}
summary <- get_multiples(summary, 2:N_max)
summary <- summary[,c("M_T", "M_P")] # Only RNA states
# Create grid
grid <- expand.grid(
summary |> as.data.frame(), KEEP.OUT.ATTRS = FALSE)
# User variables for initial state start with setUserInitial_
names(grid) <- paste0("setUserInitial_",names(grid))
grid_scan()
Now perform grid scan of initial states. grid_scan()
is
designed for efficient repeated model execution on parameter grid. Refer
to its help for more details, and here I summarize its design
concept:
expand.grid()
.parallel
.We want the following statistics:
?compute_rmse
and
?compute_cosine
.These statistics mean we apply the following function to each run data:
default_attractor <- model_gen$new()$run(sim_hours)
default_attractor <-
default_attractor[(length(sim_hours)-240):length(sim_hours),]
stat.fn <- function(raw_run, reference = default_attractor){
# Return code == 2 means successful integration (at least for lsoda)
succ <- attr(raw_run, "istate")[1] == 2
# Subset only the last 240 time points - should be stabilized
raw_run <- raw_run[(nrow(raw_run)-240):nrow(raw_run),]
# Compute normalized RMSE
nrmse <- compute_rmse(raw_run, reference, normalize = "range")
nrmse <- max(nrmse)
# Compute cosine similarity
cos <- compute_cosine(raw_run, reference)
cos <- min(cos)
# Return
c(converged = succ, nrmse = nrmse, cos = cos)
}
Before running, let’s see how this statistics function will slow down simulation. As shown, we add about ~10% time and ~40% memory footprint, which is minimal.
print(bench::mark(stat.fn(model$run(sim_hours))))
#> # A tibble: 1 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#> 1 stat.fn(mod… 22.9ms 24.7ms 39.7 1.35MB 0 20 0 504ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
print(bench::mark(model$run(sim_hours)))
#> # A tibble: 1 × 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#> 1 model$run(s… 20.1ms 22.6ms 44.1 735KB 2.10 21 1 476ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>
Now let’s run the grid scan of initial states. We run
grid_scan
, and then wrangle the results into a data frame.
As below, all converged to the same attractor.
scan <-
grid_scan(model_gen, grid, apply.fn = stat.fn,
n.core = 1, custom.export = "default_attractor",
sim_hours)
process_scan <- function(){
.scanDF <- scan |> unlist(use.names = FALSE) |> matrix(ncol = 3, byrow=TRUE)
colnames(.scanDF) <- names(scan[[1]])
result <- cbind(grid, .scanDF |> as.data.frame())
summary(
result |> select(converged, nrmse, cos)
)
}
process_scan()
#> converged nrmse cos
#> Min. :1 Min. :2.178e-05 Min. :1
#> 1st Qu.:1 1st Qu.:2.674e-05 1st Qu.:1
#> Median :1 Median :3.498e-05 Median :1
#> Mean :1 Mean :4.622e-05 Mean :1
#> 3rd Qu.:1 3rd Qu.:5.446e-05 3rd Qu.:1
#> Max. :1 Max. :9.314e-05 Max. :1
Plot two phase diagrams as an example (note that I rerun
grid_scan()
with default apply_fn = identity
to get the raw data for plotting):
# Rerun scan
scan <-
grid_scan(model_gen, grid, apply.fn = identity,
n.core = 1, custom.export = "default_attractor",
sim_hours)
# Show first and last of grid
first <- grid[1,]
last <- grid[nrow(grid),]
print(first)
#> setUserInitial_M_T setUserInitial_M_P
#> 1 11.47733 6.50043
print(last)
#> setUserInitial_M_T setUserInitial_M_P
#> 4 16.11438 9.314509
plot(plot_phase(scan[[1]] |> as.data.frame(), M_T, C_N))
plot(plot_phase(scan[[nrow(grid)]] |> as.data.frame(), M_T, C_N))
How about modifying one parameter and see if that affects initial state stability with the same grid? Just add that parameter new value as a constant to the grid!
Remember that our goal is to see whether parameter change causes convergence property change - so our reference attractor should be computed with the new parameter too. As below, just as an example, doubling translation rate of TIM mRNA does not affect the behavior - all initial states still converged to the clock attractor.
new_grid <- grid
new_grid$k_sT <- 1.8 # 2X of original, check model$content()
new_model <- model_gen$new()
new_model$set_user(k_sT = 1.8)
default_attractor <- new_model$run(sim_hours)
default_attractor <-
default_attractor[(length(sim_hours)-240):length(sim_hours),]
# Then, repeat the same code above.
scan <-
grid_scan(model_gen, new_grid, apply.fn = stat.fn,
n.core = 1, custom.export = "default_attractor",
sim_hours)
process_scan()
#> converged nrmse cos
#> Min. :1 Min. :9.734e-07 Min. :1
#> 1st Qu.:1 1st Qu.:3.700e-06 1st Qu.:1
#> Median :1 Median :6.884e-06 Median :1
#> Mean :1 Mean :6.884e-06 Mean :1
#> 3rd Qu.:1 3rd Qu.:1.007e-05 3rd Qu.:1
#> Max. :1 Max. :1.279e-05 Max. :1