This vignette describes the R interface to NeuralEstimators, a Julia package that facilitates a suite of neural methods for simulation-based inference. These methods are likelihood-free and amortised, in the sense that, once the neural networks are trained on simulated data, they enable rapid inference across arbitrarily many observed data sets in a fraction of the time required by conventional approaches. The package accommodates any model for which simulation is feasible by allowing users to define their model implicitly through simulated data.

Comprehensive documentation for the Julia version of the package is available here.

Note on math rendering: When viewing this vignette online, you may encounter issues with calligraphic math symbols rendering as “square boxes”. If this occurs, you can adjust your settings to resolve it by right-clicking on the square box, and then selecting Math Settings > Math Renderer > Common HTML.

1 Methodology

Here, we provide an overview of the amortised neural inferential methods supported by the package, which include neural Bayes estimators (Section 1.1), neural posterior estimators (Section 1.2), and neural ratio estimators (Section 1.3). For further details on each of these methods and amortised neural inference more broadly, see the review paper by Zammit-Mangion et al. (2025) and the references therein.

Notation: We denote model parameters of interest by \(\boldsymbol{\theta} \equiv (\theta_1, \dots, \theta_d)' \in \Theta\), where \(\Theta \subseteq \mathbb{R}^d\) is the parameter space. We denote data by \(\boldsymbol{Z} \equiv (Z_1, \dots, Z_n)' \in \mathcal{Z}\), where \(\mathcal{Z} \subseteq \mathbb{R}^n\) is the sample space. We denote neural-network parameters by \(\boldsymbol{\gamma}\). For simplicity, we assume that all measures admit densities with respect to the Lebesgue measure. We use \(\pi(\cdot)\) to denote the prior density function of the parameters. The input argument to a generic density function \(p(\cdot)\) serves to specify both the random variable associated with the density and its evaluation point.

1.1 Neural Bayes estimators

The goal of parametric point estimation is to estimate \(\boldsymbol{\theta}\) from data \(\boldsymbol{Z}\) using an estimator, \(\hat{\boldsymbol{\theta}} : \mathcal{Z}\to\Theta\). Estimators can be constructed intuitively within a decision-theoretic framework based on average-risk optimality. Specifically, consider a loss function \(L: \Theta \times \Theta \to [0, \infty)\). Then the Bayes risk of the estimator \(\hat{\boldsymbol{\theta}}(\cdot)\) is
\[ \int_\Theta \int_{\mathcal{Z}} L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}(\boldsymbol{Z}))p(\boldsymbol{Z} \mid \boldsymbol{\theta}) \pi(\boldsymbol{\theta}) \textrm{d} \boldsymbol{Z} \textrm{d}\boldsymbol{\theta}. \] Any minimiser of the Bayes risk is said to be a Bayes estimator with respect to \(L(\cdot, \cdot)\) and \(\pi(\cdot)\).

Bayes estimators are functionals of the posterior distribution (e.g., the Bayes estimator under quadratic loss is the posterior mean), and are therefore often unavailable in closed form. A way forward is to assume a flexible parametric function for \(\hat{\boldsymbol{\theta}}(\cdot)\), and to optimise the parameters within that function in order to approximate the Bayes estimator. Neural networks are ideal candidates, since they are universal function approximators, and because they are fast to evaluate. Let \(\hat{\boldsymbol{\theta}}_{\boldsymbol{\gamma}} : \mathcal{Z}\to\Theta\) denote a neural network parameterised by \(\boldsymbol{\gamma}\). Then a Bayes estimator may be approximated by \(\hat{\boldsymbol{\theta}}_{\boldsymbol{\gamma^*}}(\cdot)\), where \[ \boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min\;}} \frac{1}{K} \sum_{k = 1}^K L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}_{\boldsymbol{\gamma}}(\boldsymbol{Z}^{(k)})), \] with \(\boldsymbol{\theta}^{(k)} \sim \pi(\boldsymbol{\theta})\) and, independently for each \(k\), \(\boldsymbol{Z}^{(k)} \sim p(\boldsymbol{Z} \mid \boldsymbol{\theta}^{(k)})\). The process of obtaining \(\boldsymbol{\gamma}^*\) is referred to as “training the network”, and this can be performed efficiently using back-propagation and stochastic gradient descent. The trained neural network \(\hat{\boldsymbol{\theta}}_{\boldsymbol{\gamma^*}}(\cdot)\) approximately minimises the Bayes risk, and therefore it is called a neural Bayes estimator (Sainsbury-Dale et al., 2024).

Once trained, a neural Bayes estimator can be applied repeatedly to observed data (whose structure conforms with the chosen neural-network architecture) at a fraction of the computational cost of conventional inferential methods. It is therefore ideal to use a neural Bayes estimator in settings where inference needs to be made repeatedly; in this case, the initial training cost is said to be amortised over time.

1.1.1 Uncertainty quantification with neural Bayes estimators

Uncertainty quantification with neural Bayes estimators often proceeds through the bootstrap distribution (e.g., Lenzi et al., 2023; Richards et al., 2024; Sainsbury-Dale et al., 2024). Bootstrap-based approaches are particularly attractive when nonparametric bootstrap is possible (e.g., when the data are independent replicates), or when simulation from the fitted model is fast, in which case parametric bootstrap is also computationally efficient. However, these conditions are not always met and, although bootstrap-based approaches are often considered to be fairly accurate and favourable to methods based on asymptotic normality, there are situations where bootstrap procedures are not reliable (see, e.g., Canty et al., 2006, pg. 6).

Alternatively, by leveraging ideas from (Bayesian) quantile regression, one may construct a neural Bayes estimator that approximates a set of marginal posterior quantiles (Fisher et al., 2023; Sainsbury-Dale et al., 2025), which can then be used to construct credible intervals for each parameter. Inference then remains fully amortised since, once the estimators are trained, both point estimates and credible intervals can be obtained with virtually zero computational cost. Specifically, posterior quantiles can be targeted by training a neural Bayes estimator under the loss function \[ L(\boldsymbol{\theta}, \hat{\boldsymbol{\theta}}; \tau) \equiv \sum_{j=1}^d (\hat{\theta}_j - \theta_j)\{\mathbb{I}(\hat{\theta}_j - \theta_j) - \tau\}, \quad 0 < \tau < 1, \] where \(\mathbb{I}(\cdot)\) denotes the indicator function, since the Bayes estimator under this loss function is the vector of marginal posterior \(\tau\)-quantiles (Sainsbury-Dale et al., 2025, Sec. 2.2.4).

1.2 Neural posterior estimators

We now describe amortised approximate posterior inference through the minimisation of an expected Kullback–Leibler (KL) divergence. Throughout, we let \(q(\boldsymbol{\theta}; \boldsymbol{\kappa})\) denote a parametric approximation to the posterior distribution \(p(\boldsymbol{\theta} \mid \boldsymbol{Z})\), where the approximate-distribution parameters \(\boldsymbol{\kappa}\) belong to a space \(\mathcal{K}\).

We first consider the non-amortised case, where the optimal parameters \(\boldsymbol{\kappa}^*\) for a single data set \(\boldsymbol{Z}\) are found by minimising the KL divergence between \(p(\boldsymbol{\theta} \mid \boldsymbol{Z})\) and \(q(\boldsymbol{\theta}; \boldsymbol{\kappa})\): \[ \boldsymbol{\kappa}^* \equiv \underset{\boldsymbol{\kappa}}{\mathrm{arg\,min\;}} \textrm{KL}\{p(\boldsymbol{\theta} \mid \boldsymbol{Z}) \, \| \, q(\boldsymbol{\theta} ;\boldsymbol{\kappa})\} = \underset{\boldsymbol{\kappa}}{\mathrm{arg\,min\;}} -\int_{\Theta} \log\{ q(\boldsymbol{\theta} ;\boldsymbol{\kappa})\} p(\boldsymbol{\theta} \mid \boldsymbol{Z}) \textrm{d}\boldsymbol{\theta} . \] The resulting approximate posterior \(q(\boldsymbol{\theta}; \boldsymbol{\kappa}^*)\) targets the true posterior in the sense that the KL divergence is zero if and only if \(q(\boldsymbol{\theta}; \boldsymbol{\kappa}^*) = p(\boldsymbol{\theta} \mid \boldsymbol{Z})\) for all \(\boldsymbol{\theta} \in \Theta\). However, solving this optimisation problem is often computationally demanding even for a single data set \(\boldsymbol{Z}\), and solving it for many different data sets can be computationally prohibitive. The optimisation problem can be amortised by treating the parameters \(\boldsymbol{\kappa}\) as a function \(\boldsymbol{\kappa} : \mathcal{Z} \to \mathcal{K}\), and then choosing the function \(\boldsymbol{\kappa}^*(\cdot)\) that minimises an expected KL divergence: \[ \boldsymbol{\kappa}^*(\cdot) \equiv \underset{\boldsymbol{\kappa}(\cdot)}{\mathrm{arg\,min\;}} \mathbb{E}_{\boldsymbol{Z}}[\textrm{KL}\{p(\boldsymbol{\theta} \mid \boldsymbol{Z}) \, \| \, q(\boldsymbol{\theta} ;\boldsymbol{\kappa}(\boldsymbol{Z}))\}]. \] In practice, we approximate \(\boldsymbol{\kappa}^*(\cdot)\) using a neural network, \(\boldsymbol{\kappa}_{\boldsymbol{\gamma}} : \mathcal{Z} \to \mathcal{K}\), which is parameterised by \(\boldsymbol{\gamma}\) and trained by minimising a Monte Carlo approximation of the expected KL divergence above: \[ \boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} -\sum_{k=1}^K \log q(\boldsymbol{\theta}^{(k)}; \boldsymbol{\kappa}_{\boldsymbol{\gamma}}(\boldsymbol{Z}^{(k)})). \] Once trained, the neural network \(\boldsymbol{\kappa}_{\boldsymbol{\gamma}^*}(\cdot)\) may be used to estimate the optimal approximate-distribution parameters \(\boldsymbol{\kappa}^*\) given data \(\boldsymbol{Z}\) at almost no computational cost. The neural network \(\boldsymbol{\kappa}_{\boldsymbol{\gamma}^*}(\cdot)\), together with the corresponding approximate distribution \(q(\cdot; \boldsymbol{\kappa})\), is collectively referred to as a neural posterior estimator.

There are numerous options for the approximate distribution \(q(\cdot; \boldsymbol{\kappa})\). For instance, \(q(\cdot;\boldsymbol{\kappa})\) can be modelled as a Gaussian distribution (e.g., Chan et al., 2018; see the Julia documentation for GaussianDistribution), where the parameters \(\boldsymbol{\kappa} = (\boldsymbol{\mu}', \textrm{vech}(\boldsymbol{L})')'\) consist of a \(d\)-dimensional mean parameter \(\boldsymbol{\mu}\) and the \(d(d+1)/2\) non-zero elements of the lower Cholesky factor \(\boldsymbol{L}\) of a covariance matrix, and the half-vectorisation operator \(\textrm{vech}(\cdot)\) vectorises the lower triangle of its matrix argument. One may also consider Gaussian mixtures (e.g., Papamakarios & Murray, 2016) or trans-Gaussian distributions (e.g., Maceda et al., 2024). However, the most widely adopted approach is to model \(q(\cdot\,; \boldsymbol{\kappa})\) using a normalising flow (e.g., Ardizzone et al., 2019; Radev et al., 2022), excellent reviews for which are given by Kobyzev et al. (2020) and Papamakarios (2021). A particularly popular class of normalising flow is the affine coupling flow (e.g., Dinh et al., 2016; Kingma & Dhariwal, 2018; Ardizzone et al., 2019). Since affine coupling flows are universal density approximators (Teshima et al., 2020), they serve as the default and recommended choice for approximate distributions in this package; for further details, see the Julia documentation for NormalisingFlow.

1.3 Neural ratio estimators

Finally, we describe amortised inference by approximation of the likelihood-to-evidence ratio, \[ r(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv p(\boldsymbol{Z} \mid \boldsymbol{\theta})/p(\boldsymbol{Z}), \] where \(p(\boldsymbol{Z} \mid \boldsymbol{\theta})\) is the likelihood and \(p(\boldsymbol{Z})\) is the marginal likelihood (also known as the model evidence).

The likelihood-to-evidence ratio is ubiquitous in statistical inference. For example, likelihood ratios of the form \(p(\boldsymbol{Z}\mid \boldsymbol{\theta}_0)/p(\boldsymbol{Z}\mid \boldsymbol{\theta}_1)=r(\boldsymbol{Z}, \boldsymbol{\theta}_0)/r(\boldsymbol{Z}, \boldsymbol{\theta}_1)\) are central to hypothesis testing and model comparison, and naturally appear in the transition probabilities of most standard MCMC algorithms used for Bayesian inference. Further, since the likelihood-to-evidence ratio is a prior-free quantity, its approximation facilitates Bayesian inference in applications where one requires multiple fits of the model under different prior distributions.

Unlike the methods discussed earlier, the likelihood-to-evidence ratio might not immediately seem like a quantity well-suited for approximation by neural networks, which are typically trained by minimising empirical risk functions. However, this ratio emerges naturally as a simple transformation of the optimal solution to a standard binary classification problem, derived through the minimisation of an average risk. Specifically, consider a binary classifier \(c(\boldsymbol{Z}, \boldsymbol{\theta})\) that distinguishes dependent data-parameter pairs \({(\boldsymbol{Z}', \boldsymbol{\theta}')' \sim p(\boldsymbol{Z}, \boldsymbol{\theta})}\) with class labels \(Y=1\) from independent data-parameter pairs \({(\tilde{\boldsymbol{Z}}', \tilde{\boldsymbol{\theta}}')' \sim p(\boldsymbol{Z})p(\boldsymbol{\theta})}\) with class labels \(Y=0\), and where the classes are balanced. Here, \(p(\boldsymbol{\theta})\) denotes an arbitrary “proposal” distribution for \(\boldsymbol{\theta}\) that does not, in general, coincide with the prior distribution (see below). Then, the Bayes classifier under binary cross-entropy loss is defined as \[ \begin{aligned} c^*(\cdot, \cdot) &\equiv \underset{c(\cdot, \cdot)}{\mathrm{arg\,min}} \sum_{y\in\{0, 1\}} \textrm{Pr}(Y = y) \int_\Theta\int_\mathcal{Z}L_{\textrm{BCE}}\{y, c(\boldsymbol{Z}, \boldsymbol{\theta})\}p(\boldsymbol{Z}, \boldsymbol{\theta} \mid Y = y)\textrm{d} \boldsymbol{Z} \textrm{d} \boldsymbol{\theta}\\ &= \underset{c(\cdot, \cdot)}{\mathrm{arg\,min}} - \int_\Theta\int_\mathcal{Z}\Big[\log\{c(\boldsymbol{Z}, \boldsymbol{\theta})\}p(\boldsymbol{Z}, \boldsymbol{\theta}) + \log\{1 - c(\boldsymbol{Z}, \boldsymbol{\theta})\}p(\boldsymbol{Z})p(\boldsymbol{\theta}) \Big]\textrm{d} \boldsymbol{Z} \textrm{d} \boldsymbol{\theta}, \end{aligned} \] where \(L_{\textrm{BCE}}(y, c) \equiv -y\log(c) - (1 - y) \log(1 - c)\). It can be shown (e.g., Hermans et al., 2020, App. B) that the Bayes classifier is given by \[ c^*(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{p(\boldsymbol{Z}, \boldsymbol{\theta})}{p(\boldsymbol{Z}, \boldsymbol{\theta}) + p(\boldsymbol{\theta})p(\boldsymbol{Z})}, \quad \boldsymbol{Z} \in \mathcal{Z}, \boldsymbol{\theta} \in \Theta, \] and, hence, \[ r(\boldsymbol{Z}, \boldsymbol{\theta}) = \frac{c^*(\boldsymbol{Z}, \boldsymbol{\theta})}{1 - c^*(\boldsymbol{Z}, \boldsymbol{\theta})}, \quad \boldsymbol{Z} \in \mathcal{Z}, \boldsymbol{\theta} \in \Theta. \] This connection links the likelihood-to-evidence ratio to the average-risk-optimal solution of a standard binary classification problem, and consequently provides a foundation for approximating the ratio using neural networks. Specifically, let \(c_{\boldsymbol{\gamma}}: \mathcal{Z} \times \Theta \to (0, 1)\) denote a neural network parametrised by \(\boldsymbol{\gamma}\). Then the Bayes classifier may be approximated by \(c_{\boldsymbol{\gamma}^*}(\cdot, \cdot)\), where \[ \boldsymbol{\gamma}^* \equiv \underset{\boldsymbol{\gamma}}{\mathrm{arg\,min}} -\sum_{k=1}^K \Big[\log\{c_{\boldsymbol{\gamma}}(\boldsymbol{Z}^{(k)}, \boldsymbol{\theta}^{(k)})\} + \log\{1 - c_{\boldsymbol{\gamma}}(\boldsymbol{Z}^{(\sigma(k))}, \boldsymbol{\theta}^{(k)})\} \Big], \] with each \(\boldsymbol{\theta}^{(k)}\) sampled independently from a “proposal” distribution \(p(\boldsymbol{\theta})\), \(\boldsymbol{Z}^{(k)} \sim p(\boldsymbol{Z} \mid \boldsymbol{\theta}^{(k)})\), and \(\sigma(\cdot)\) a random permutation of \(\{1, \dots, K\}\). The proposal distribution \(p(\boldsymbol{\theta})\) does not necessarily correspond to the prior distribution \(\pi(\boldsymbol{\theta})\), which is specified in the downstream inference algorithm (see below). In theory, any \(p(\boldsymbol{\theta})\) with support over \(\Theta\) can be used. However, with finite training data, the choice of \(p(\boldsymbol{\theta})\) is important, as it determines where the parameters \(\{\boldsymbol{\theta}^{(k)}\}\) are most densely sampled and, hence, where the neural network \(c_{\boldsymbol{\gamma}^*}(\cdot, \cdot)\) best approximates the Bayes classifier. Further, since neural networks are only reliable within the support of their training samples, a \(p(\boldsymbol{\theta})\) lacking full support over \(\Theta\) essentially acts as a “soft prior”.

Once the neural network is trained, \(r_{\boldsymbol{\gamma}^*}(\boldsymbol{Z}, \boldsymbol{\theta}) \equiv c_{\boldsymbol{\gamma}^*}(\boldsymbol{Z}, \boldsymbol{\theta})\{1 - c_{\boldsymbol{\gamma}^*}(\boldsymbol{Z}, \boldsymbol{\theta})\}^{-1}\), \(\boldsymbol{Z} \in \mathcal{Z}, \boldsymbol{\theta} \in \Theta\), may be used to quickly approximate the likelihood-to-evidence ratio, and therefore it is called a neural ratio estimator.

Inference based on a neural ratio estimator may proceed in a frequentist setting via maximum likelihood and likelihood ratios (e.g., Walchessen et al., 2024), and in a Bayesian setting by facilitating the computation of transition probabilities in Hamiltonian Monte Carlo and MCMC algorithms (e.g., Hermans et al., 2020). Further, an approximate posterior distribution can be obtained via the identity \({p(\boldsymbol{\theta} \mid \boldsymbol{Z})} = \pi(\boldsymbol{\theta}) r(\boldsymbol{\theta}, \boldsymbol{Z})\), and sampled from using standard sampling techniques (e.g., Thomas et al., 2022).

2 Package overview

The R interface to NeuralEstimators aims to facilitate the use of neural inferential methods for statisticians familiar with R. This is achieved by leveraging the R package JuliaConnectoR (Lenz et al., 2022), which enables seamless communication between R and Julia. The core functions of NeuralEstimators, such as training and assessing the neural networks, are wrapped in R functions to make them accessible (and user-friendly) within the R environment. The only step that requires users to write Julia code is the construction of the neural network architecture. This design choice ensures maximum flexibility for defining custom architectures.

The typical workflow when using the R interface to NeuralEstimators is as follows, with the steps requiring Julia code explicitly highlighted:

Sample parameters from the prior, \(\pi(\boldsymbol{\theta})\), to form training/validation/test parameter sets. Alternatively, define a function to sample parameters dynamically during training. Parameters are stored as \(d \times K\) matrices, with \(d\) the dimensionality of the parameter vector and \(K\) the number of parameter vectors in the given parameter set.
Simulate data from the model conditional on the above parameter sets, to form training/validation/test data sets. Alternatively, define a function to simulate data dynamically during training. When using the R interface, data are stored as lists, where each element of the list is associated with one parameter vector, and where the storage type of each data set depends on the multivariate structure of the data (e.g., a matrix for unstructured multivariate data, a multidimensional array for gridded data).
(Julia) Design and initialise a suitable neural network using a Flux.jl model. The architecture class (e.g., MLP, CNN, GNN) should align with the multivariate structure of the data (e.g., unstructured, grid, graph). The specific input and output spaces of the neural network depend on the chosen inferential method:
- For neural Bayes estimators, the neural network is a mapping \(\mathcal{Z}\to\Theta\), where \(\mathcal{Z}\) denotes the sample space and \(\Theta\) denotes the parameter space.
- For neural posterior estimators, the neural network is a mapping \(\mathcal{Z}\to\mathcal{K}\), where \(\mathcal{K}\) denotes the space of the approximate-distribution parameters \(\boldsymbol{\kappa}\).
- For neural ratio estimators, the neural network is a mapping \(\mathcal{Z}\times\Theta\to\mathbb{R}\).
(Julia) Wrap the neural network as a NeuralEstimator corresponding to the intended inferential method:
- For neural Bayes estimators under general, user-defined loss functions, use PointEstimator;
- For neural posterior estimators, use PosteriorEstimator (see also the available built-in approximate distributions);
- For neural ratio estimators, use RatioEstimator.
Train the NeuralEstimator using train() and the training set, monitoring performance and convergence using the validation set.
Assess the NeuralEstimator using assess() and the test set.

Once the NeuralEstimator has passed our assessments and is deemed to be well calibrated, it may be used to make inference with observed data.

3 Examples

We now consider examples where the data are univariate (Section 2.1), multivariate and unstructured (Section 2.2), collected over a regular grid (Section 2.3), or collected over arbitrary spatial locations (Section 2.4). These examples focus on neural Bayes estimation using a PointEstimator. For a list of other estimator types available in the package (e.g., for neural posterior or ratio estimation), see Estimators and the examples therein.

Before proceeding, we load the required packages: see the README for instructions on installing Julia and the Julia packages NeuralEstimators and Flux. If you have access to an NVIDIA graphics processing unit (GPU), you can utilise it by simply adding CUDA to the list of Julia packages in the final line below:

library("NeuralEstimators")
library("JuliaConnectoR")
library("ggplot2")
library("ggpubr") 
library("latex2exp")
juliaEval('using NeuralEstimators, Flux') 
#> Starting Julia ...

3.1 Univariate data

Here, we develop a neural Bayes estimator for \(\boldsymbol{\theta} \equiv (\mu, \sigma)'\) from data \(\boldsymbol{Z} \equiv (Z_1, \dots, Z_m)'\), where each \(Z_i \overset{\mathrm{iid}}\sim \mathrm{Gau}(\mu, \sigma^2)\).

First, we sample parameters from the prior to construct parameter sets used for training and validating the estimator. Here, we use the priors \(\mu \sim \rm{Gau}(0, 1)\) and \(\sigma \sim \rm{Gamma}(1, 1)\), and we assume that the parameters are independent a priori. The sampled parameters are stored as \(d \times K\) matrices, with \(d\) the number of parameters in the model and \(K\) the number of sampled parameter vectors.

# Sampling from the prior 
# K: number of samples to draw from the prior
sampler <- function(K) {
  mu    <- rnorm(K)
  sigma <- rgamma(K, 1)
  Theta <- matrix(c(mu, sigma), byrow = TRUE, ncol = K)
  return(Theta)
}
K <- 10000
theta_train <- sampler(K)
theta_val   <- sampler(K/10)

Next, we define the statistical model implicitly through data simulation. The simulated data are stored as a list, where each element of the list corresponds to a data set simulated conditionally on one parameter vector. The storage type of each data set depends on the multivariate structure of the data (e.g., a matrix for unstructured multivariate data, a multidimensional array for gridded data). In this example, each replicate \(Z_1, \dots, Z_m\) is univariate, so the data are stored as a matrix with \(n = 1\) row and \(m\) columns:

# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
simulate <- function(Theta, m) {
  apply(Theta, 2, function(theta) t(rnorm(m, theta[1], theta[2])), simplify = FALSE)
}

m <- 30
Z_train <- simulate(theta_train, m)
Z_val   <- simulate(theta_val, m)

Let’s visualise a few elements of the training set:

mu    <- theta_train[1, 1:4]
sigma <- theta_train[2, 1:4]
Z     <- Z_train[1:4]

df <- Map(function(z, m, s) {
  data.frame(Z = c(z), mu = m, sigma = s)
  }, Z, mu, sigma)
df <- do.call(rbind, df)

df$theta <- paste0("$\\mu$ = ", round(df$mu, 3), ", $\\sigma$ = ", round(df$sigma, 3))
df$theta <- as.factor(df$theta)
levels(df$theta) <- sapply(levels(df$theta), TeX)

ggplot(df) + 
  geom_histogram(aes(x = Z), bins = 30, fill = "blue", alpha = 0.5, color = "black") +
  facet_grid(~theta, labeller = label_parsed) +
  theme_bw()

We now design our neural network.

As we are constructing a neural Bayes estimator, the neural network is a mapping \(\mathcal{Z}\to\Theta\), and the dimensionality of the neural-network output is therefore \(d \equiv \textrm{dim}(\Theta) = 2\).

Since our data are replicated, we adopt the DeepSets framework, implemented in NeuralEstimators with the type DeepSet. DeepSets consist of two neural networks: an inner network and an outer network. The inner network extracts summary statistics from the data, and its architecture depends on the multivariate structure of the data. For unstructured data (i.e., data without spatial or temporal correlation within each replicate), we use a multilayer perceptron (MLP). The input dimension matches the dimensionality of each data replicate, while the output dimension corresponds to the number of summary statistics appropriate for the model (a common choice is \(d\)). The outer network maps the learned summary statistics to the output space (here, the parameter space, \(\Theta\)). The outer network is always an MLP.

Below is an example of a DeepSets architecture for neural Bayes estimation in this example. Note that many models have parameter constraints (e.g., variance and range parameters that must be strictly positive). These constraints can be incorporated in the final layer of the neural network by choosing appropriate activation functions for each parameter. Here, we enforce the constraint \(\sigma > 0\) by applying the softplus activation function in the final layer of the outer network, ensuring that all parameter estimates are valid. For some additional ways to constrain parameter estimates, see Output layers.

# Initialise the estimator
estimator <- juliaEval('
  n = 1    # dimension of each data replicate (univariate)
  d = 2    # dimension of the parameter vector θ
  w = 128  # width of each hidden layer 
  
  # Final layer has output dimension d and enforces parameter constraints
  final_layer = Parallel(
      vcat,
      Dense(w, 1, identity),     # mean parameter: no constraints
      Dense(w, 1, softplus)      # standard-deviation parameter: strictly positive
  )

  # Construct inner and outer networks and combine into DeepSet
  psi = Chain(Dense(n, w, relu), Dense(w, w, relu))    
  phi = Chain(Dense(w, w, relu), final_layer)          
  network = DeepSet(psi, phi)

  # Wrap the neural network as a PointEstimator
  estimator = PointEstimator(network)
')

Next, we train the estimator using train(), here using the default mean-absolute-error loss, so that the resulting neural Bayes estimator approximates the marginal posterior medians. We use the sets of parameters and simulated data constructed earlier; one may alternatively use the arguments sampler and simulator to pass functions for sampling from the prior and simulating from the model, respectively, which facilitates the technique known as “simulation on-the-fly”. These functions can be defined in R (as we have done in this example) or in Julia using the package JuliaConnectoR (see the help file for train()).

# Train the estimator
estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train = Z_train,
  Z_val   = Z_val
  )
#> Computing the initial validation risk... Initial validation risk = 0.7422407
#> Computing the initial training risk... Initial training risk = 0.76490337
#> Epoch: 1  Training risk: 0.245  Validation risk: 0.177  Run time of epoch: 9.313 seconds
#> Epoch: 2  Training risk: 0.174  Validation risk: 0.15  Run time of epoch: 1.203 seconds
#> Epoch: 3  Training risk: 0.158  Validation risk: 0.155  Run time of epoch: 1.019 seconds
#> Epoch: 4  Training risk: 0.154  Validation risk: 0.145  Run time of epoch: 1.25 seconds
#> Epoch: 5  Training risk: 0.149  Validation risk: 0.139  Run time of epoch: 1.242 seconds
#> Epoch: 6  Training risk: 0.145  Validation risk: 0.132  Run time of epoch: 1.101 seconds
#> Epoch: 7  Training risk: 0.143  Validation risk: 0.13  Run time of epoch: 1.114 seconds
#> Epoch: 8  Training risk: 0.143  Validation risk: 0.131  Run time of epoch: 1.04 seconds
#> Epoch: 9  Training risk: 0.143  Validation risk: 0.134  Run time of epoch: 1.078 seconds
#> Epoch: 10  Training risk: 0.141  Validation risk: 0.13  Run time of epoch: 1.035 seconds
#> Epoch: 11  Training risk: 0.139  Validation risk: 0.131  Run time of epoch: 0.926 seconds
#> Epoch: 12  Training risk: 0.136  Validation risk: 0.131  Run time of epoch: 1.248 seconds
#> Epoch: 13  Training risk: 0.139  Validation risk: 0.125  Run time of epoch: 0.79 seconds
#> Epoch: 14  Training risk: 0.136  Validation risk: 0.127  Run time of epoch: 0.959 seconds
#> Epoch: 15  Training risk: 0.136  Validation risk: 0.13  Run time of epoch: 1.173 seconds
#> Epoch: 16  Training risk: 0.136  Validation risk: 0.131  Run time of epoch: 1.014 seconds
#> Epoch: 17  Training risk: 0.135  Validation risk: 0.128  Run time of epoch: 1.005 seconds
#> Epoch: 18  Training risk: 0.135  Validation risk: 0.132  Run time of epoch: 0.874 seconds
#> Epoch: 19  Training risk: 0.135  Validation risk: 0.128  Run time of epoch: 1.153 seconds
#> Stopping early since the validation loss has not improved in 5 epochs

If the argument savepath in train() is specified, the neural-network state (e.g., the weights and biases) will be saved during training as bson files, and the best state (as measured by validation risk) will be saved as best_network.bson. To load these saved states into a neural network in later R sessions, one may use the function loadstate(). Note that one may also manually save a trained estimator using savestate().

Once a neural Bayes estimator has been trained, its performance should be assessed. Simulation-based (empirical) methods for evaluating the performance of a neural Bayes estimator are ideal, since simulation is already required for constructing the estimator, and because the estimator can be applied to thousands of simulated data sets at almost no computational cost.

To assess the accuracy of the resulting neural Bayes estimator, one may use the function assess(). The resulting object can then be passed to the functions risk(), bias(), and rmse() to compute a Monte Carlo approximation of the Bayes risk, the bias, and the RMSE, or passed to the function plotestimates() to visualise the estimates against the corresponding true values:

# Assess the estimator
theta_test <- sampler(1000)
Z_test     <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "NBE")
#>  Running NBE...
plotestimates(assessment, parameter_labels = c("θ1" = expression(mu), "θ2" = expression(sigma)))

Once the neural Bayes estimator has been trained and assessed, it can be applied to observed data using the function estimate(), and non-parametric bootstrap estimates can be computed using the function bootstrap(). Below, we use simulated data as a surrogate for observed data:

theta    <- as.matrix(c(0, 0.5))     # true parameters
Z        <- simulate(theta, m)       # pretend that this is observed data
estimate(estimator, Z)               # point estimate from the "observed data"
#>            [,1]
#> [1,] -0.1159449
#> [2,]  0.5819227
bootstrap(estimator, Z)[, 1:6]       # non-parametric bootstrap estimates
#>            [,1]       [,2]       [,3]      [,4]       [,5]         [,6]
#> [1,] -0.2536258 -0.1886561 -0.1325511 0.0453563 -0.1121531 5.151611e-05
#> [2,]  0.7270321  0.6252608  0.4906906 0.6401244  0.4488662 6.788481e-01

3.2 Multivariate data

Suppose now that our data consists of \(m\) replicates of a \(d\)-dimensional multivariate distribution.

For unstructured \(d\)-dimensional data, the estimator is based on a multilayer perceptron (MLP), and each data set is stored as a two-dimensional array (a matrix), with the second dimension storing the independent replicates. That is, in this case, we store the data as a list of \(d \times m\) matrices, and the inner network of the DeepSets representation has \(d\) neurons in the input layer.

For example, consider the task of estimating \(\boldsymbol{\theta} \equiv (\mu_1, \mu_2, \sigma, \rho)'\) from data \(\boldsymbol{Z}_1, \dots, \boldsymbol{Z}_m\) that are independent and identically distributed according to the bivariate Gaussian distribution, \[ \rm{Gau}\left(\begin{bmatrix}\mu_1 \\ \mu_2\end{bmatrix}, \sigma^2\begin{bmatrix} 1 & \rho \\ \rho & 1\end{bmatrix}\right). \] Then, to construct a neural Bayes estimator for this model, one may use the following code for defining a prior, the data simulator, and the neural-network architecture:

# Sampling from the prior 
# K: number of samples to draw from the prior
sampler <- function(K) {
  mu1    <- rnorm(K)
  mu2    <- rnorm(K)
  sigma  <- rgamma(K, 1)
  rho    <- runif(K, -1, 1)
  theta  <- matrix(c(mu1, mu2, sigma, rho), byrow = TRUE, ncol = K)
  return(theta)
}

# Marginal simulation from the statistical model
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
simulate <- function(Theta, m) {
  apply(Theta, 2, function(theta) {
    mu    <- c(theta[1], theta[2])
    sigma <- theta[3]
    rho   <- theta[4]
    Sigma <- sigma^2 * matrix(c(1, rho, rho, 1), 2, 2)
    Z <- MASS::mvrnorm(m, mu, Sigma)
    t(Z)
  }, simplify = FALSE)
}

# Initialise the estimator
estimator <- juliaEval('
  d = 2    # dimension of each replicate 
  w = 32   # number of neurons in each hidden layer
  
  # Layer to ensure valid estimates
  final_layer = Parallel(
      vcat,
      Dense(w, 2, identity), # mean parameters
      Dense(w, 1, softplus), # variance parameter
      Dense(w, 1, tanh)      # correlation parameter
    )

  psi = Chain(Dense(d, w, relu), Dense(w, w, relu), Dense(w, w, relu))
  phi = Chain(Dense(w, w, relu), Dense(w, w, relu), final_layer)
  deepset = DeepSet(psi, phi)
  estimator = PointEstimator(deepset)
')

Let’s visualise a few realisations from the current model:

theta <- sampler(3)
Z     <- simulate(theta, m = 250)
mu1   <- theta[1, 1:3]
mu2   <- theta[2, 1:3]
sigma <- theta[3, 1:3]
rho   <- theta[4, 1:3]

df <- Map(function(z, m1, m2, s, r) {
  data.frame(Z1 = z[1, ], Z2 = z[2, ], mu1 = m1, mu2 = m2, sigma = s, rho = r)
  }, Z, mu1, mu2, sigma, rho)
df <- do.call(rbind, df)

df$theta <- paste0(
  "$\\mu_1$ = ", round(df$mu1, 2), ", ",
  "$\\mu_2$ = ", round(df$mu2, 2), ", ",
  "$\\sigma$ = ", round(df$sigma, 2), ", ",
  "$\\rho$ = ", round(df$rho, 2)
  )
df$theta <- as.factor(df$theta)
levels(df$theta) <- sapply(levels(df$theta), TeX)

ggplot(df) + 
  geom_point(aes(x = Z1, y = Z2), alpha = 0.5) +
  facet_grid(~theta, labeller = label_parsed) +
  theme_bw()

The training, assessment, and application of the neural Bayes estimator then remains as in the example above:

# Construct training and validation data sets
K <- 5000
m <- 250
theta_train <- sampler(K)
theta_val   <- sampler(K/10)
Z_train <- simulate(theta_train, m)
Z_val   <- simulate(theta_val, m)

# Train the estimator
estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train = Z_train,
  Z_val   = Z_val, 
  epochs = 20
  )
#> Computing the initial validation risk... Initial validation risk = 0.7245567
#> Computing the initial training risk... Initial training risk = 0.6967557
#> Epoch: 1  Training risk: 0.396  Validation risk: 0.224  Run time of epoch: 3.916 seconds
#> Epoch: 2  Training risk: 0.197  Validation risk: 0.183  Run time of epoch: 1.719 seconds
#> Epoch: 3  Training risk: 0.173  Validation risk: 0.167  Run time of epoch: 1.773 seconds
#> Epoch: 4  Training risk: 0.161  Validation risk: 0.161  Run time of epoch: 1.73 seconds
#> Epoch: 5  Training risk: 0.151  Validation risk: 0.153  Run time of epoch: 1.619 seconds
#> Epoch: 6  Training risk: 0.146  Validation risk: 0.145  Run time of epoch: 1.735 seconds
#> Epoch: 7  Training risk: 0.141  Validation risk: 0.143  Run time of epoch: 1.744 seconds
#> Epoch: 8  Training risk: 0.136  Validation risk: 0.139  Run time of epoch: 1.686 seconds
#> Epoch: 9  Training risk: 0.136  Validation risk: 0.139  Run time of epoch: 1.546 seconds
#> Epoch: 10  Training risk: 0.132  Validation risk: 0.137  Run time of epoch: 1.715 seconds
#> Epoch: 11  Training risk: 0.131  Validation risk: 0.129  Run time of epoch: 1.678 seconds
#> Epoch: 12  Training risk: 0.127  Validation risk: 0.128  Run time of epoch: 1.669 seconds
#> Epoch: 13  Training risk: 0.126  Validation risk: 0.129  Run time of epoch: 1.719 seconds
#> Epoch: 14  Training risk: 0.126  Validation risk: 0.122  Run time of epoch: 1.917 seconds
#> Epoch: 15  Training risk: 0.123  Validation risk: 0.124  Run time of epoch: 1.885 seconds
#> Epoch: 16  Training risk: 0.123  Validation risk: 0.124  Run time of epoch: 1.839 seconds
#> Epoch: 17  Training risk: 0.119  Validation risk: 0.121  Run time of epoch: 2.005 seconds
#> Epoch: 18  Training risk: 0.122  Validation risk: 0.126  Run time of epoch: 1.882 seconds
#> Epoch: 19  Training risk: 0.117  Validation risk: 0.126  Run time of epoch: 1.796 seconds
#> Epoch: 20  Training risk: 0.118  Validation risk: 0.117  Run time of epoch: 1.791 seconds

# Assess the estimator
theta_test <- sampler(1000)
Z_test     <- simulate(theta_test, m)
assessment <- assess(estimator, theta_test, Z_test, 
                     estimator_names = "NBE", 
                     parameter_names = c("mu1", "mu2", "sigma", "rho")) 
#>  Running NBE...
plotestimates(assessment)

3.3 Gridded data

For data collected over a regular grid, neural Bayes estimators are based on a convolutional neural network (CNN). We give specific attention to this case in a separate vignette. There, we also outline how to perform neural Bayes estimation with incomplete/missing data based on methods described by Wang et al. (2024) and Sainsbury-Dale et al. (2025b).

3.4 Irregular spatial data

To cater for spatial data collected over arbitrary spatial locations, one may construct a neural estimator with a graph neural network (GNN; see Sainsbury-Dale, Zammit-Mangion, Richards, and Huser, 2025). The overall workflow remains as given in previous examples, with two key additional considerations.

First, if inference is to be made from a single spatial data set collected before constructing estimator, training data can be simulated using the observed spatial locations, which can be treated as fixed and known. However, if the estimator is intended for application to multiple spatial data sets with varying spatial configurations, it should be trained on a diverse set of spatial configurations. These configurations can be sampled during training, possibly using a spatial point process.

Second, the spatial data should be stored as a graph.

For illustration, we develop a neural Bayes estimator for a spatial Gaussian process model, where \(\boldsymbol{Z} \equiv (Z_{1}, \dots, Z_{n})'\) are data collected at locations \(\{\boldsymbol{s}_{1}, \dots, \boldsymbol{s}_{n}\}\) in a spatial domain that is a subset of \(\mathbb{R}^2\). The data are modelled as spatially-correlated mean-zero Gaussian random variables with exponential covariance function, given by \[ \textrm{cov}(Z_i, Z_j) = \textrm{exp}(-\|\boldsymbol{s}_i - \boldsymbol{s}_j\|/\theta), \quad i, j = 1, \dots, n, \] with unknown range parameter \(\theta > 0\). Here, we take the spatial domain to be the unit square, we sample spatial configurations from a Matern cluster process, and we adopt a uniform prior \(\theta \sim \rm{Unif}(0, 0.4)\).

To begin, we define a function for sampling from the prior, and a function for sampling spatial configurations and simulating from the statistical model conditional on the sampled spatial configurations and parameter vectors.

# Sampling from the prior 
# K: number of samples to draw from the prior
sampler <- function(K) { 
  theta <- runif(K, max = 0.4) # draw from the prior 
  theta <- t(theta)            # reshape to matrix
  return(theta)
}

# Marginal simulation from the statistical model, split into simulation of the 
# spatial configurations S and the spatial data Z
# theta: a matrix of parameters drawn from the prior
# m: number of conditionally independent replicates for each parameter vector
library("parallel") # mclapply()
library("spatstat") # rMatClust()
simulateS <- function(K) { 
  # Simulate spatial configurations over the unit square, with each configuration
  # drawn from a Matern cluster process with different, randomly sampled hyperparameters
  n      <- sample(100:300, K, replace = TRUE)  # approximate sample size 
  lambda <- runif(K, 10, 50)                    # intensity of parent Poisson point process
  mu     <- n/lambda                            # mean number of daughter points 
  r      <- runif(K, 0.05, 0.2)                 # cluster radius
  S <- lapply(1:K, function(k) {
    pts <- rMatClust(lambda[k], r = r[k], mu = mu[k])
    cbind(x = pts$x, y = pts$y)
  })
  
  return(S)
}
simulateZ <- function(theta, S, m = 1) {
  # Simulate conditionally independent replicates for each pair of 
  # parameters and spatial configurations
  Z <- mclapply(1:ncol(theta), function(k) {
    D <- as.matrix(dist(S[[k]])) # distance matrix   
    n <- nrow(D)                 # sample size
    Sigma <- exp(-D/theta[k])    # covariance matrix
    L <- t(chol(Sigma))          # Cholesky factor of the covariance matrix
    e <- matrix(rnorm(n*m), nrow = n, ncol = m) # standard normal variates
    Z <- L %*% e                 # conditionally independent replicates from the model
    Z
  })
  
  return(Z)
}
simulate <- function(theta, S, ...) {
  K <- ncol(theta)
  S <- simulateS(K)
  Z <- simulateZ(theta, S, ...)
  G <- spatialgraph(S, Z) # graph from the spatial configurations and associated spatial data
  return(G)
}

Let’s visualise a few realisations from our spatial Gaussian process model:

theta <- matrix(c(0.05, 0.1, 0.2, 0.4), nrow = 1)
K     <- ncol(theta)
S     <- simulateS(K)
Z     <- simulateZ(theta, S)

df <- Map(function(z, th, s) {
  data.frame(Z = c(z), theta = th, s1 = s[, 1], s2 = s[, 2])
  }, Z, theta, S)
df <- do.call(rbind, df)

df$theta <- paste0("$\\theta$ = ", round(df$theta, 2))
df$theta <- as.factor(df$theta)
levels(df$theta) <- sapply(levels(df$theta), TeX)

ggplot(df) + 
  geom_point(aes(x = s1, y = s2, colour = Z)) +
  facet_grid(~theta, labeller = label_parsed) +
  scale_colour_viridis_c(option = "magma") + 
  labs(x = expression(s[1]), y = expression(s[2])) +
  scale_x_continuous(breaks = c(0.2, 0.5, 0.8)) + 
  scale_y_continuous(breaks = c(0.2, 0.5, 0.8)) + 
  coord_fixed() + 
  theme_bw()

Next, we define our GNN architecture and initialise the estimator. Here, we use an architecture tailored to isotropic spatial dependence models; for further details, see Section 2.2.1 of Sainsbury-Dale et al. (2025). In this example our goal is to construct a point estimator, however other kinds of estimators can be constructed by substituting the appropriate estimator class in the final line below:

# Initialise the estimator 
estimator <- juliaEval('

  # Load Julia packages for GNN functionality
  using GraphNeuralNetworks
  using Statistics: mean

  # Spatial weight functions: continuous surrogates for 0-1 basis functions 
  h_max = 0.15 # maximum distance to consider 
  q = 10       # output dimension of the spatial weights
  w = KernelWeights(h_max, q)
  
  # Propagation module
  propagation = GNNChain(
    SpatialGraphConv(1 => q, relu, w = w, w_out = q),
    SpatialGraphConv(q => q, relu, w = w, w_out = q)
  )
  
  # Readout module
  readout = GlobalPool(mean)
  
  # Inner network
  psi = GNNSummary(propagation, readout)
  
  # Expert summary statistics, the empirical variogram
  V = NeighbourhoodVariogram(h_max, q)
  
  # Outer network
  phi = Chain(
    Dense(2q  => 128, relu), 
    Dense(128 => 128, relu), 
    Dense(128 => 1, softplus) # softplus activation to ensure positive estimates
  )
  
  # DeepSet object 
  deepset = DeepSet(psi, phi; S = V)
  
  # Point estimator
  estimator = PointEstimator(deepset)
')

Next, we train the estimator:

# Construct training and validation data sets
K <- 5000
theta_train <- sampler(K)
theta_val   <- sampler(K/10)
Z_train <- simulate(theta_train)
Z_val   <- simulate(theta_val)

# Train the estimator
estimator <- train(
  estimator,
  theta_train = theta_train,
  theta_val   = theta_val,
  Z_train     = Z_train,
  Z_val       = Z_val,
  epochs      = 20
  )
#> Computing the initial validation risk... Initial validation risk = 0.43207824
#> Computing the initial training risk... Initial training risk = 0.42689615
#> Epoch: 1  Training risk: 0.046  Validation risk: 0.026  Run time of epoch: 42.692 seconds
#> Epoch: 2  Training risk: 0.027  Validation risk: 0.027  Run time of epoch: 27.13 seconds
#> Epoch: 3  Training risk: 0.026  Validation risk: 0.025  Run time of epoch: 27.295 seconds
#> Epoch: 4  Training risk: 0.024  Validation risk: 0.025  Run time of epoch: 27.866 seconds
#> Epoch: 5  Training risk: 0.024  Validation risk: 0.024  Run time of epoch: 26.801 seconds
#> Epoch: 6  Training risk: 0.023  Validation risk: 0.025  Run time of epoch: 28.124 seconds
#> Epoch: 7  Training risk: 0.023  Validation risk: 0.024  Run time of epoch: 27.827 seconds
#> Epoch: 8  Training risk: 0.022  Validation risk: 0.024  Run time of epoch: 28.377 seconds
#> Epoch: 9  Training risk: 0.023  Validation risk: 0.023  Run time of epoch: 28.272 seconds
#> Epoch: 10  Training risk: 0.022  Validation risk: 0.024  Run time of epoch: 27.355 seconds
#> Epoch: 11  Training risk: 0.022  Validation risk: 0.023  Run time of epoch: 27.683 seconds
#> Epoch: 12  Training risk: 0.022  Validation risk: 0.025  Run time of epoch: 27.456 seconds
#> Epoch: 13  Training risk: 0.022  Validation risk: 0.023  Run time of epoch: 27.191 seconds
#> Epoch: 14  Training risk: 0.022  Validation risk: 0.023  Run time of epoch: 26.82 seconds
#> Epoch: 15  Training risk: 0.022  Validation risk: 0.023  Run time of epoch: 27.02 seconds
#> Stopping early since the validation loss has not improved in 5 epochs

Note that the computations in GNNs are performed in parallel, making them particularly well-suited for GPUs, which typically contain thousands of cores. If you have access to an NVIDIA GPU, you can utilise it by simply loading the Julia package CUDA, for example, by running juliaEval('using CUDA').

Next, we assess the trained estimator:

# Assess the estimator
theta_test <- sampler(1000)
Z_test     <- simulate(theta_test)
assessment <- assess(estimator, theta_test, Z_test, estimator_names = "GNN NBE")
#>  Running GNN NBE...
plotestimates(assessment)

Finally, once the estimator has been assessed, it may be applied to observed data:

theta <- as.matrix(0.1)        # true parameter
Z     <- simulate(theta)[[1]]  # simulated data as a substitute for observed data
estimate(estimator, Z)         # point estimates from the "observed data"
#>            [,1]
#> [1,] 0.09701955

NeuralEstimators

Matthew Sainsbury-Dale, Andrew Zammit-Mangion, and Raphaël Huser