Parallelize 'purrr' functions

The 'purrr' logo + The 'futurize' hexlogo = The 'future' logo

The futurize package allows you to easily turn sequential code into parallel code by piping the sequential code to the futurize() function. Easy!

TL;DR

library(futurize)
plan(multisession)
library(purrr)

slow_fcn <- function(x) {
  Sys.sleep(0.1)  # emulate work
  x^2
}

xs <- 1:1000
ys <- xs |> map(slow_fcn) |> futurize()

Introduction

This vignette demonstrates how to use this approach to parallelize purrr functions such as map(), map_dbl(), and walk().

The purrr map() function is commonly used to apply a function to the elements of a vector or a list. For example,

library(purrr)
xs <- 1:1000
ys <- map(xs, slow_fcn)

or equivalently using pipe syntax

library(purrr)
xs <- 1:1000
ys <- xs |> map(slow_fcn)

Here map() evaluates sequentially, but we can easily make it evaluate in parallel, by using:

library(futurize)
library(purrr)
xs <- 1:1000
ys <- xs |> map(slow_fcn) |> futurize()

This will distribute the calculations across the available parallel workers, given that we have set parallel workers, e.g.

plan(multisession)

The built-in multisession backend parallelizes on your local computer and it works on all operating systems. There are [other parallel backends] to choose from, including alternatives to parallelize locally as well as distributed across remote machines, e.g.

plan(future.mirai::mirai_multisession)

and

plan(future.batchtools::batchtools_slurm)

Another example is:

library(purrr)
library(futurize)
plan(future.mirai::mirai_multisession)

ys <- 1:10 |>
        map(rnorm, n = 10) |> futurize(seed = TRUE) |>
        map_dbl(mean) |> futurize()

Supported Functions

The futurize() function supports parallelization of the following purrr functions: