The futurize package allows you to easily turn sequential code
into parallel code by piping the sequential code to the futurize()
function. Easy!
library(futurize)
plan(multisession)
library(pbapply)
slow_fcn <- function(x) {
message("x = ", x)
Sys.sleep(0.1) # emulate work
sqrt(x)
}
xs <- 1:100
ys <- pblapply(xs, slow_fcn) |> futurize()
This vignette demonstrates how to use this approach to parallelize
pbapply functions such as pblapply(), pbsapply(), and
pbvapply().
The pbapply package provides progress-bar versions of the
base-R *apply() family of functions. It supports parallel
processing via the cl argument, which accepts a PSOCK cluster
object or, when used with futurize, the string "future".
The pblapply() function works like lapply() but displays a
progress bar. For example:
library(pbapply)
slow_fcn <- function(x) {
Sys.sleep(0.1) # emulate work
sqrt(x)
}
## Apply a function to each element with a progress bar
xs <- 1:100
ys <- pblapply(xs, slow_fcn)
Here pblapply() evaluates sequentially, but we can easily make it
evaluate in parallel by piping to futurize():
library(pbapply)
library(futurize)
plan(multisession) ## parallelize on local machine
xs <- 1:100
ys <- pblapply(xs, slow_fcn) |> futurize()
Comment: The message("x = ", x) output is not relayed to the main R
session by design, because if it were, it would clutter up the
progress bar that pbapply renders, which is the whole purpose of
using pbapply in the first place.
The built-in multisession backend parallelizes on your local
computer and works on all operating systems. There are other
parallel backends to choose from, including alternatives to
parallelize locally as well as distributed across remote machines,
e.g.
plan(future.mirai::mirai_multisession)
and
plan(future.batchtools::batchtools_slurm)
The pbsapply() function simplifies the result like sapply():
library(futurize)
plan(multisession)
library(pbapply)
xs <- 1:100
ys <- pbsapply(xs, slow_fcn) |> futurize()
The following pbapply functions are supported by futurize():
pbapply()pbby()pbeapply()pblapply()pbreplicate()pbsapply()pbtapply()pbvapply()pbwalk()For comparison, here is what it takes to parallelize pblapply()
using the parallel package directly, without futurize:
library(pbapply)
library(parallel)
## Set up a PSOCK cluster
ncpus <- 4L
cl <- makeCluster(ncpus)
## Run pblapply in parallel
xs <- 1:100
ys <- pblapply(xs, slow_fcn, cl = cl)
## Tear down the cluster
stopCluster(cl)
This requires you to manually create and manage the cluster
lifecycle. If you forget to call stopCluster(), or if your code
errors out before reaching it, you leak background R processes. You
also have to decide upfront how many CPUs to use and what cluster
type to use. Switching to another parallel backend, e.g. a Slurm
cluster, would require a completely different setup. With
futurize, all of this is handled for you - just pipe to
futurize() and control the backend with plan().
An alternative to using pbapply for progress reporting is to use
the progressr package, which is specially designed to work with
the Futureverse ecosystem and provide progress updates from
parallelized computations in a near-live fashion. See the
vignette("futurize-11-apply", package = "futurize") for more
details.