generate_aft_dgm_flex() — general data-generating
model (DGM) builder. Accepts any survival dataset and fits an
accelerated failure time (AFT) super-population model with
user-specified treatment effect heterogeneity parameters. This is the
recommended starting point when building a DGM based on a dataset other
than GBSG.
simulate_from_dgm() — general simulator for drawing
trial replicates from an aft_dgm_flex DGM. Supersedes
simulate_from_gbsg_dgm() for new code. Column names in the
returned data frame use underscore notation (y_sim,
event_sim, treat_sim,
flag_harm).
run_simulation_analysis() (general version) —
simulation wrapper that calls simulate_from_dgm() and
accepts explicit column-name parameters, making it applicable to any DGM
built with generate_aft_dgm_flex() or
setup_gbsg_dgm(). The GBSG dataset is one application of
this general pipeline rather than a separate code path.
setup_gbsg_dgm() — the recommended entry point for
all GBSG-based simulation work. Encodes the data preparation and
subgroup definition from León et al. (2024) and returns an
aft_dgm_flex-compatible object accepted by
simulate_from_dgm() and
run_simulation_analysis(). Existing scripts using
create_gbsg_dgm() can migrate with a one-line change:
dgm <- setup_gbsg_dgm(model = "alt", k_inter = k, seed = seed).
create_gbsg_dgm() is superseded by
setup_gbsg_dgm(). It remains fully functional and continues
to produce correct results; no existing GBSG simulation scripts need to
change. The distinction is that setup_gbsg_dgm() returns an
object of class c("aft_dgm_flex", "gbsg_dgm") compatible
with the general pipeline, whereas create_gbsg_dgm()
returns only "gbsg_dgm". A .Deprecated()
signal is emitted to encourage migration in new code.
simulate_from_gbsg_dgm() is superseded by
simulate_from_dgm() for new code. Column names in the
output change from dot-notation to underscore notation — see the mapping
table below. Pass analysis_time = Inf to match the legacy
max_follow = Inf behaviour.
| Legacy column | General column |
|---|---|
y.sim |
y_sim |
event.sim |
event_sim |
treat |
treat_sim |
flag.harm |
flag_harm |
run_simulation_analysis(max_follow) → use
analysis_time. If supplied, max_follow is
forwarded to analysis_time with a warning.
run_simulation_analysis(muC_adj) → use
cens_adjust. If supplied, muC_adj is forwarded
to cens_adjust with a warning.
The following bugs were discovered and fixed during the general
pipeline migration. All affected code paths were exercised by GBSG
factor variables (v1–v7) stored as
factor() rather than numeric().
lasso_selection()
(get_FSdata_helpers.R): as.matrix() on a data
frame containing factor columns produced a character matrix that
cv.glmnet() rejected. Factor columns with all-numeric
levels are now coerced via as.integer(as.character(.))
before matrix conversion.
process_conf_force_expr()
(get_FSdata_helpers.R): mean() applied to a
factor column returned NA. Factor columns are now coerced
to numeric before mean(), median(), and
quantile() calls.
evaluate_comparison()
(forestsearch_helpers.R): the <= /
>= operator applied to a factor column triggered an
Ops.factor warning and returned NA. Factor
columns are now coerced to numeric before comparison.
forestsearch() (forestsearch_main.R):
df[, conf.screen] dropped to a vector when
conf.screen had length 1, causing dummy() to
error on a non-data-frame input. Fixed by adding
drop = FALSE.
default_grf_params_gen()
(run_simulation_analysis.R): maxdepth was
initialised to 4, exceeding the maximum of 3
accepted by grf.subg.harm.survival(). Corrected to
2 (matching the legacy default).
default_grf_params_gen()
(run_simulation_analysis.R): sg.criterion was
set to "hr", which is not a valid value. Corrected to
"mDiff" (matching the legacy default).
create_gbsg_dgm() and
simulate_from_gbsg_dgm() are now thin public wrappers that
call .create_gbsg_dgm_() and
.simulate_from_gbsg_dgm_() internally. This prevents
warning spam in functions that call these in loops or binary searches
(calibrate_k_inter(), get_dgm_with_output(),
validate_k_inter_effect()).
compute_dgm_cde() now resolves the super-population
data frame from dgm$df_super_rand (GBSG DGMs) or
dgm$df_super (general aft_dgm_flex DGMs),
making it compatible with both class hierarchies.
globals.R: added "sim_id" to
utils::globalVariables() to suppress a spurious
R CMD check NOTE from
run_simulation_analysis.R.