genproc turns one-off iterative R procedures
(for, lapply(), purrr::pmap(), …)
into production-grade workflows by wrapping them with orthogonal,
composable execution layers: Logged and
Reproducibility are always active;
Parallel, Non-blocking, and
Monitoring compose on top. The goal is to make the
transition from “a script that works on one case” to “a
system that runs reliably on many cases” an architectural step, not
an improvised rewrite.
The two always-active layers:
withCallingHandlers()) and per-case timing.The three optional layers:
future
ecosystem (future.apply::future_lapply()).genproc()
returns immediately with a genproc_result of status
"running" while the run continues in a background future.
Poll with status(), block with await().progressr — opt-in
progress reporting that emits one progression signal per completed case
in sequential and parallel modes.genproc has zero Shiny dependency. A
companion package (genprocShiny) will later build a UI on
top of these functions.
# install.packages("remotes")
remotes::install_github("danielrak/genproc")A toy file-conversion task: read a few CSVs from one directory, save them as RDS into another. One case per file.
library(genproc)
# Synthetic workspace
src_dir <- file.path(tempdir(), "src")
dst_dir <- file.path(tempdir(), "dst")
dir.create(src_dir, showWarnings = FALSE, recursive = TRUE)
dir.create(dst_dir, showWarnings = FALSE, recursive = TRUE)
write.csv(head(iris), file.path(src_dir, "a.csv"), row.names = FALSE)
write.csv(head(mtcars), file.path(src_dir, "b.csv"), row.names = FALSE)
write.csv(head(airquality), file.path(src_dir, "c.csv"), row.names = FALSE)
# Per-case function
convert <- function(src_dir, src_file, dst_dir, dst_file) {
df <- read.csv(file.path(src_dir, src_file))
saveRDS(df, file.path(dst_dir, dst_file))
}
# Iteration mask: one row per conversion
mask <- data.frame(
src_dir = src_dir,
src_file = c("a.csv", "b.csv", "c.csv"),
dst_dir = dst_dir,
dst_file = c("a.rds", "b.rds", "c.rds"),
stringsAsFactors = FALSE
)
result <- genproc(convert, mask)Every run returns a genproc_result — a structured list
with a stable shape across runs:
result
#> genproc result
#> Status : done
#> Started : 2026-05-07 21:05:36 CEST
#> Mode : sequential
#> Cases : 3 ( 3 ok, 0 error )
#> Duration : 0.06 secsThe log data.frame holds one row per case, with
case_id, the mask parameter values (src_dir,
src_file, dst_dir, dst_file
here), then success, error_message,
traceback, and duration_secs. Below we display
a subset of those columns for readability — error_message
and traceback are NA on this happy path:
result$log[, c("case_id", "src_file", "dst_file",
"success", "duration_secs")]
#> case_id src_file dst_file success duration_secs
#> 1 case_0001 a.csv a.rds TRUE 0.02
#> 2 case_0002 b.csv b.rds TRUE 0.00
#> 3 case_0003 c.csv c.rds TRUE 0.00If a case fails, the run continues — the error is captured, not
thrown. Below we point one row of the mask to a file that does not
exist; b.csv itself is left untouched on disk so subsequent
sections of this README can still reference it:
mask_with_missing <- mask
mask_with_missing$src_file[2] <- "does_not_exist.csv"
result2 <- genproc(convert, mask_with_missing)
#> Warning in file(file, "rt"): cannot open file
#> 'C:\Users\rheri\AppData\Local\Temp\RtmpMjCXRi/src/does_not_exist.csv': No such
#> file or directory
errors(result2)[, c("case_id", "src_file", "error_message")]
#> case_id src_file error_message
#> 2 case_0002 does_not_exist.csv cannot open the connection
result2$n_success
#> [1] 2
result2$n_error
#> [1] 1The traceback column holds the real R call stack of the failing case
(filtered for the tryCatch/withCallingHandlers
machinery), so debugging a failed case reads like a normal R error.
Three helpers digest a result without touching
result$log directly:
errors(result2) # data.frame of failed cases only
#> case_id src_dir
#> 2 case_0002 C:\\Users\\rheri\\AppData\\Local\\Temp\\RtmpMjCXRi/src
#> src_file dst_dir
#> 2 does_not_exist.csv C:\\Users\\rheri\\AppData\\Local\\Temp\\RtmpMjCXRi/dst
#> dst_file success error_message
#> 2 b.rds FALSE cannot open the connection
#> traceback
#> 2 1. read.csv(file.path(src_dir, src_file))\n2. read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.ch ...\n3. file(file, "rt")
#> duration_secs
#> 2 0
summary(result2) # printable digest: status, success rate,
#> genproc result summary
#> Status : done
#> Cases : 3 (2 ok, 1 error)
#> Success : 67%
#> Total time : 0.02s
#> Per case : mean 0.000s, max 0.000s (slowest: case_0001)
#>
#> Top errors:
#> 1x cannot open the connection
# duration stats, top recurring errorsTwo more close the loop by re-running a targeted subset:
rerun_failed(result2, convert) # only failed cases
rerun_affected(result0, diff, convert) # only cases referenced
# by changed inputs (see
# next section)result$reproducibility is a plain list recording the R
version, OS, timezone, loaded package versions, the exact mask used, the
specs of any optional layer, and a fingerprint of every input file
referenced by the mask. This snapshot lives inside the result — no side
file to keep in sync:
str(result$reproducibility, max.level = 1)
#> List of 11
#> $ timestamp : POSIXct[1:1], format: "2026-05-07 21:05:36"
#> $ r_version : chr "R version 4.5.1 (2025-06-13 ucrt)"
#> $ platform : chr "x86_64-w64-mingw32"
#> $ os : chr "Windows 10 x64"
#> $ locale : chr "LC_COLLATE=French_France.utf8;LC_CTYPE=French_France.utf8;LC_MONETARY=French_France.utf8;LC_NUMERIC=C;LC_TIME=F"| __truncated__
#> $ timezone : chr "Europe/Paris"
#> $ packages : Named chr [1:22] "0.2.0" "4.5.1" "1.2.0" "3.6.5" ...
#> ..- attr(*, "names")= chr [1:22] "genproc" "compiler" "fastmap" "cli" ...
#> $ mask_snapshot:'data.frame': 3 obs. of 4 variables:
#> $ parallel : NULL
#> $ nonblocking : NULL
#> $ inputs :List of 3A taste of what is captured (first few package versions):
head(result$reproducibility$packages, 5)
#> genproc compiler fastmap cli tools
#> "0.2.0" "4.5.1" "1.2.0" "3.6.5" "4.5.1"result$reproducibility$inputs records the size and mtime
of every input file the mask refers to. The intent is to flag the most
common reproducibility failure: re-running the same code on the same
paths after an upstream file has been silently rewritten.
genproc detects input columns automatically: any character column of
the mask whose values are existing files is treated as such. Pass
track_inputs = FALSE to skip the capture, or override the
heuristic with input_cols = c(...) (force) or
skip_input_cols = c(...) (exclude).
mask_paths <- data.frame(
csv_in = file.path(src_dir, c("a.csv", "b.csv", "c.csv")),
stringsAsFactors = FALSE
)
do_one <- function(csv_in) nrow(read.csv(csv_in))
run0 <- genproc(do_one, mask_paths)
run0$reproducibility$inputs$files
#> path size
#> 1 C:/Users/rheri/AppData/Local/Temp/RtmpMjCXRi/src/a.csv 221
#> 2 C:/Users/rheri/AppData/Local/Temp/RtmpMjCXRi/src/b.csv 303
#> 3 C:/Users/rheri/AppData/Local/Temp/RtmpMjCXRi/src/c.csv 161
#> mtime
#> 1 2026-05-07 21:05:36
#> 2 2026-05-07 21:05:36
#> 3 2026-05-07 21:05:36diff_inputs() compares two runs and tells you which
referenced files have changed since the first one:
# Rewrite a.csv with strictly more content (size changes)
write.csv(iris, file.path(src_dir, "a.csv"), row.names = FALSE)
run1 <- genproc(do_one, mask_paths)
diff_inputs(run0, run1)
#> genproc input diff (method: stat)
#> Changed: 1
#> Unchanged: 2
#> Removed: 0
#> Added: 0
#> Cases affected: 1
#>
#> Changed files:
#> C:/Users/rheri/AppData/Local/Temp/RtmpMjCXRi/src/a.csv
#> size: 221 B -> 4.1 KB
#> mtime: 2026-05-07 21:05:36 -> 2026-05-07 21:05:36
#>
#> Cases affected (use rerun_affected() to re-run):
#> case_0001The default method is "stat" (size + mtime). It detects
every legitimate modification at near-zero cost; a content-hash variant
is on the roadmap for stronger guarantees on adversarial workloads.
Dispatch cases across workers by passing a
parallel_spec():
# Four workers, a temporary multisession plan, restored on exit
result <- genproc(
convert, mask,
parallel = parallel_spec(workers = 4)
)The two mandatory layers remain active in parallel mode. Case order
is preserved in the log regardless of the order in which workers return.
Under the hood, parallel_spec() builds a configuration
object consumed by future.apply::future_lapply().
If you manage future::plan() yourself (recommended
across several calls to amortize worker startup), pass
parallel_spec() without a strategy and your current plan is
used unchanged.
Return immediately, keep the console, collect later:
job <- genproc(
convert, mask,
nonblocking = nonblocking_spec()
)
status(job) # "running", "done (not collected)", "done", or "error"
job <- await(job) # blocks until resolution
job$lognonblocking_spec() can be composed with
parallel_spec() — the non-blocking wrapper envelops the
parallel dispatch:
job <- genproc(
convert, mask,
parallel = parallel_spec(workers = 4),
nonblocking = nonblocking_spec()
)Wrap a run in progressr::with_progress() to opt in to
per-case progress reporting:
library(progressr)
with_progress({
result <- genproc(convert, mask)
})Pick a renderer once per session with
progressr::handlers() (default
"txtprogressbar", alternatives include
"progress" and "cli"). Without
with_progress(), or without progressr
installed, the integration is a complete no-op. Live monitoring of
non-blocking runs is on the roadmap.
Lifecycle: experimental. The five execution layers
(logged, reproducibility, parallel, non-blocking, monitoring) and the
genproc_result contract are committed to forward
compatibility across the 0.x series — existing fields are guaranteed not
to be removed or renamed. New fields and new optional layers may be
added.
The 0.2.0 release is the first public submission and is not yet on CRAN. Install from GitHub for now (see above).
See vignette("genproc") for a deep dive: anatomy of the
log and reproducibility snapshot, per-case error handling, composition
patterns, and roadmap.
MIT. See LICENSE.md.