Before starting, it should be mentioned that the sole purpose of this vignette is to provide intuitive and easily replicable instructions on how to use the FVDDPpkg package on . For this reason, the underlying theory will not be developed except in the minimum necessary terms; for more information, please refer to the bibliography cited here.
First of all, import the package.
As shown in the work of Papaspiliopoulos, Ruggiero, and Spanò (2016) Fleming-Viot Dependent Dirichlet Processes (FVDDP), conditioned on observed data \(Y\), take the general form of finite mixtures of Dirichlet Processes: in fact \[X_t \ | \ Y \sim \sum_{\mathbf{m} \in M} w_\mathbf{m}\Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where
The derivation of this model, which stems from the study of population genetics, is done by exploiting the concept of duality for Markov processes (Papaspiliopoulos and Ruggiero (2014)), applying it to the results of Ethier and Griffiths (1993), among the others, on the seminal work of Fleming and Viot (1979).
In order to understand how to recover the previous expression, start
noting that, unconditional on observed data \[X_{t_0} \sim \Pi_{\alpha},\] where the
time is arbitrarily set \(t=t_0\). This
means that, while no data is included within the model, FVDDP can be
fully characterized by \(\theta\) and
\(P_0\). The creation of a process is
carried out using the function initialize
The user has to
specify the positive real number theta
and two functions,
the first to sample from \(P_0\)
(sampling.f
) and the second to evaluate its p.d.f. or
p.m.f. (density.f
), depending whether it is atomic or not;
this is specified by the last argument, atomic
. The
function returns an object of class fvddp
.
FVDDP = initialize(theta = 1.28, sampling.f = function(x) rpois(x, 5),
density.f = function(x) dpois(x, 5), TRUE)
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Empty Process
In this chunk of code, for example, \(\theta = 1.28\) and \(P_0 \sim \mathrm{Po(5)}\). Note that when printing the process, it is explicitly stated that no data has been included within the model.
Updating the process with data collected at time \(t_0\) stored in the vector \(Y_0\), the form of the latent signal becomes \[ X_{t_0} \ | \ Y_0 \sim \Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}. \] In some sense, this already is the mixture expressed in the general formula, under the specification that the vector \(\mathbf{y}^\ast\) collects the unique values observed at time \(t_0\), \(\mathbf{m}\) stores the multiplicities of \(Y_0\) with respect to \(\mathbf{y}^\ast\), and \(M = \{ \mathbf{m}\}\): this implies that \(w_\mathbf{m}= 1\).
The update is performed by means of the update()
command, whose arguments are an fvddp
object and a numeric
vector. The returned object will include the information provided by
\(Y_0\). In particular:
y.star
.M
of size \(|M| \times K\), where \(|M|\) denotes the cardinality of \(M\) and \(K\) is the length of \(\mathbf{y}^\ast\). Each vector \(\mathbf{m}\) is stored as a row of
M
.w
of size \(|M|\). Its
\(j\)-th element represents the weight
associated to the \(j\)-th row of
M
.FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 7 9
#>
#> Multiplicities (M):
#> 4 7 9
#> [1,] 1 2 1
#>
#> Weights (w):
#> [1] 1
As one could expect, updating the signal with the vector \((4,7,7,9)\) (the order of the input does not matter) leads to a degenerate mixture with \(\mathbf{y}^\ast = (4,7,9)\) and \(M = \{ (1,2,1)\}\).
Updating a non-empty process will have a different effect: suppose to know the law of \[X_{t_n} \ | \ Y_0, \dots, Y_{n-1} \sim \sum_{\mathbf{m} \in M} w_\mathbf{m}\Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where \(Y_j\) denotes a vector of values observed at time \(t_j\), then \[X_{t_n} | Y_0 \dots, Y_n \sim \sum_{\mathbf{m} \in (M + \mathbf{n})} w_\mathbf{m}^\phi \Pi_{\alpha + \sum_{j=1}^K m_j \delta_{y_j^\ast}}\] where \(\mathbf{n}= (n_1, \dots, n_K)\) is the vector of multiplicities of \(Y_n\) according to the unique values collected up to the same time, the new weights are such that \[w_{\mathbf{m}}^\phi \propto w_\mathbf{m}\mathrm{PU}(\mathbf{n}|\mathbf{m})\] where \(\mathrm{PU}(\mathbf{m}|\mathbf{n})\) denotes the probability of drawing a vector of multiplicities \(\mathbf{n}\) starting from \(\mathbf{m}\) via Polya urn sampling scheme under \(\theta\) and \(P_0\) specified by the model, and \[M + \mathbf{n}= \{ \mathbf{m}+ \mathbf{n}: \mathbf{m}\in M\}.\]. Hence, the following changes will be applied to the input process of the function:
y.star
.M
.For the details of the role of Polya urn scheme on the update of mixtures of Dirichlet Processes, see Antoniak (1974) and Blackwell and MacQueen (1973).
The propagation of the signal, also known as prediction, aims to estimate the state of the process at a time after the data is collected: in other words, in the future. If updating the signal one can get \(X_{t_n} \ | \ Y_0, \dots, Y_n\), the use of the propagation leads to \[X_{t_n + t}\ |\ Y_0, \dots, Y_n \sim \sum_{\mathbf{n} \in L(M)} w_\mathbf{n}^\psi \Pi_{\alpha + \sum_{j=1}^K n_j \delta_{y_j^\ast}}.\] This means that the probability mass is shifted to a set \[L(M) = \{ \mathbf{n}\in M : \exists \ \mathbf{m}\in M : \mathbf{n}\leq \mathbf{m}\}\] where the notation \(\mathbf{n}\leq \mathbf{m}\) implies that \(n_j \leq m_j \ \forall j \in \{1, \dots, K\}\). The new weights are such that \[w_\mathbf{n}^\phi = \sum_{\mathbf{m}\in M : \mathbf{n}\leq \mathbf{m}} w_\mathbf{m}p_{\mathbf{m}, \mathbf{n}}(t)\] and \(p_{\mathbf{m}, \mathbf{n}}(t)\) represents the probability of reaching \(\mathbf{n}\) starting from \(\mathbf{m}\) in a time \(t\) for a \(K\)-dimensional death process, as shown in Tavaré (1984); however, the exact value of such probability, stated in Papaspiliopoulos, Ruggiero, and Spanò (2016) is \[p_{\mathbf{m}, \mathbf{n}}(t)= \begin{cases} e^{-\lambda_{|\mathbf{m}|}t} \quad &\text{if} \ \mathbf{n}= \mathbf{m}\\ C_{|\mathbf{m}|, |\mathbf{n}|}(t) \mathrm{MVH} (\mathbf{n}; |\mathbf{n}|, \mathbf{m}) \quad &\text{if} \ \mathbf{n}< \mathbf{m}\\ \end{cases}\] where \(\lambda_{|\mathbf{m}|} = \frac{|\mathbf{m}|(\theta + |\mathbf{m}| -1)}{2}\) and \[C_{|\mathbf{m}|, |\mathbf{n}|}(t) = \big(\prod_{h=|\mathbf{n}| + 1}^{|\mathbf{m}|} \lambda_{h} \big) (-1)^{|\mathbf{m}|-|\mathbf{n}|} \sum_{k=|\mathbf{n}|}^{|\mathbf{m}|} \frac{e^{-\lambda_{k} t}}{\prod_{|\mathbf{n}| \leq h \leq |\mathbf{m}|, h \neq k }(\lambda_{k} - \lambda_{h})}\] and \(|\mathbf{m}|\) represents the \(L^1\) norm (i.e. the sum) of the vector \(\mathbf{m}\).
The propagate()
function can be exploited to propagate
the signal. Its arguments are an fvddp
object and a
(positive) time. The result is the propagated process, whose matrix
M
will be larger and whose weights will be as described in
the formulae above.
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 7 9
#>
#> Multiplicities (M):
#> 4 7 9
#> [1,] 0 0 0
#> [2,] 1 0 0
#> [3,] 0 1 0
#> [4,] 1 1 0
#> [5,] 0 2 0
#> [6,] 1 2 0
#> [7,] 0 0 1
#> [8,] 1 0 1
#> [9,] 0 1 1
#> [10,] 1 1 1
#> [11,] 0 2 1
#> [12,] 1 2 1
#>
#> Weights (w):
#> [1] 0.060274058 0.099035520 0.198071041 0.142898157 0.071449079 0.027252056
#> [7] 0.099035520 0.071449079 0.142898157 0.054504111 0.027252056 0.005881167
The example shows the propagation of the signal introduced in the
previous section, with \(t = 0.6\).
Note that y.star
does not vary. Also, note that in examples
like this, it is sufficient a time \(t \simeq
2\) to shift almost all the mass to the component of the mixture
characterized by the zero vector.
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#> <bytecode: 0x12b7a6180>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 5 7 9 10
#>
#> Multiplicities (M):
#> 4 5 7 9 10
#> [1,] 1 1 2 0 2
#> [2,] 2 1 2 0 2
#> [3,] 1 1 3 0 2
#> [4,] 2 1 3 0 2
#> [5,] 1 1 4 0 2
#> [6,] 2 1 4 0 2
#> [7,] 1 1 2 1 2
#> [8,] 2 1 2 1 2
#> [9,] 1 1 3 1 2
#> [10,] 2 1 3 1 2
#> [11,] 1 1 4 1 2
#> [12,] 2 1 4 1 2
#>
#> Weights (w):
#> [1] 0.032822343 0.051700651 0.302672433 0.327846322 0.083102652 0.061084451
#> [7] 0.009482191 0.010270845 0.060128867 0.044197613 0.011203233 0.005488398
The latter chunk shows an application of update()
on a
larger process. A larger vector y.new
may induce large
variations in the weights. Being it of size \(3\), the example does not cause an
immediately recognizable effect.
In the theory of Hidden Markov Models, the smoothing operator is used to infer the state of the signal given observations from the past, the present and the future. In other words, one can estimate \(X_t\) when \(t \leq t_n\), exploiting all collected data.
To do so, it has been shown by Ascolani, Lijoi, and Ruggiero (2023) that it is required to create two processes. The first has to be propagated forward from \(t_0\) to \(t-{i-1}\) as in the previous sections; the second one has to be run backward using the same strategy: initialize and update it at \(t_n\) and propagate it towards \(t_{n-1}\) (with a positive time \(t_n - t_{n-1}\) in the function), and sequentially update and propagate until \(t_{i+1}\) is reached.
Doing this, one will get that \[X_{t_{i-1}} \ |\ Y_0, \dots, Y_{i-1} \sim \sum_{\mathbf{n}_{i-1} \in M_{i-1}} u_{\mathbf{n}_{i-1}} \Pi_{\alpha + \sum_{j=1}^K n_{i-1,j}\delta_{y_j^\ast}}\] and \[X_{t_{i+1}} \ |\ Y_{i+1}, \dots, Y_{n}= \sum_{\mathbf{n}_{i+1}\in M_{i+1}} v_{\mathbf{n}_{i+1}} \Pi_{\alpha + \sum_{j=1}^K n_{i+1,j}\delta_{y_j^\ast}}\] where the subscript \(i-1\) and \(i+1\) are necessary to specify elements from the past or the future mixture, \(v\) stands for the weights and for example \(n_{i-1, j}\) is the \(j\)-th component of the vector \(\mathbf{n}_{i-1}\) (same for \(\mathbf{n}_{i+1}\)).
Provided this description based on available data from past and future, call \(\mathbf{n}_i\) the multiplicities generated by the vector \(Y_i\). Then
\[X_{t_i} \ |\ X_{t_{i-1}}, X_{t_{i+1}}, Y_i \sim \sum_{\substack{\mathbf{n}_{i-1}\\ \in M_{i-1}}}\sum_{\substack{\mathbf{n}_{i+1}\\ \in M_{i+1}}} u_{\mathbf{n}_{i-1}} v_{\mathbf{n}_{i+1}} \sum_{\substack{(\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) \\ \in D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}}} w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \Pi_{\alpha + \sum_{j=1}^K (\mathbf{k}_{i-1, j} + \mathbf{n}_{i,j} + \mathbf{k}_{i+1, j} )\delta_{y_j^\ast}}\] where:
if \(P_0\) is non-atomic, define the sets \[ D_{i-1} := \{ j \in \{ 1, \dots, K\} : n_{i-1, j} > 0 \ \text{and either} \ n_{i,j}>0 \ \text{or} \ n_{i+1,j}>0 \},\] \[D_{i+1} := \{ j \in \{ 1, \dots, K\} : n_{i+1, j} > 0 \ \text{and either} \ n_{i,j}>0 \ \text{or} \ n_{i-1,j}>0 \}\] and \[ S := D_{i-1} \cup D_{i+1}\] to express the indices of shared values among different times. Then \[\begin{align*} D^{\mathbf{n}_{i-1},\mathbf{n}_{i+1}} = \{ (\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) : &\mathbf{k}_{i-1}\leq \mathbf{n}_{i-1}\ \text{and} \ k_{i-1, j} > 0 \ \forall \ j \in D_{i-1},\\ &\mathbf{k}_{i+1}\leq \mathbf{n}_{i+1}\ \text{and} \ k_{i+1, j} > 0 \ \forall \ j \in D_{i+1} \} \end{align*}\] and the weights are such that:
if \(P_0\) is atomic, let \[D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}:=\{ (\mathbf{k}_{i-1}, \mathbf{k}_{i+1}) : \mathbf{k}_{i-1}\leq \mathbf{n}_{i-1}, \mathbf{k}_{i+1}\leq \mathbf{n}_{i+1}\}\] and the weights can be expressed as \[ w_{\mathbf{k}_{i-1}, \mathbf{n}_i, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\ \propto \tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} \frac{m(\mathbf{k}_{i-1}+ \mathbf{n}_i +\mathbf{k}_{i+1})}{m(\mathbf{k}_{i-1})m(\mathbf{n}_i)m(\mathbf{k}_{i+1})}\]
where \[\tilde{p}_{\mathbf{k}_{i-1}, \mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}} = p_{\mathbf{n}_{i-1}, \mathbf{k}_{i-1}}(t_i - t_{i-1})p_{\mathbf{n}_{i+1}, \mathbf{k}_{i+1}}(t_{i+1} - t_i)\] is the joint transition probability from \(\mathbf{n}_{i-1}\) to \(\mathbf{k}_{i-1}\) in time \(t_i - t_{i-1}\) and from \(\mathbf{n}_{i+1}\) to \(\mathbf{k}_{i+1}\) in time \(t_{i+1} - t_i\) and \(m(\cdot)\) is the marginal likelihood function of multiplicities in the atomic case.
This peculiar structure, developed in Ascolani, Lijoi, and Ruggiero (2023), can be
better understood with some examples. They can be shown in this
implementation with smooth()
. The arguments are two latent
signals (fvddp.past
and fvddp.future
), the
positive times \(t_i - t_{i-1}\)
(t.past
) and \(t_{i+1} -
t_i\) (t.future
) and the data collected at time
\(t_i\) (y.new
).
FVDDP_NONATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 4, 7),
density.f = function(x) dbeta(x, 4, 7), atomic = FALSE)
FVDDP_PAST_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210, 0.635, .541))
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210))
FVDDP_FUTURE_NONATOMIC = propagate(fvddp = FVDDP_FUTURE_NONATOMIC, delta.t = 0.4)
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_FUTURE_NONATOMIC, y.new = c(.635))
In the example above, two process were created with \(\theta = 0.7\) and \(P_0 \sim \mathrm{Beta}(4, 7)\). The signal was updated once in the past, and twice in the future (with a propagation between the two updates).
FVDDP_SMOOTH_NONATOMIC = smooth(fvddp.past = FVDDP_PAST_NONATOMIC, fvddp.future = FVDDP_FUTURE_NONATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(0.210, 0.635, 0.479))
FVDDP_SMOOTH_NONATOMIC
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 4, 7)
#>
#> P0.density/mass: function(x) dbeta(x, 4, 7)
#>
#> is.atomic: FALSE
#>
#> Unique values (y.star):
#> [1] 0.210 0.479 0.541 0.635
#>
#> Multiplicities (M):
#> 0.21 0.479 0.541 0.635
#> [1,] 2 1 0 3
#> [2,] 2 1 1 3
#> [3,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.23344663 0.03392615 0.73262722
Using the function on the two processes, it is possible to see that the structure described above for the nonatomic case causes a shrinkage of the mixture. Indeed, the set \(M\) only contains three vectors.
In order to make a comparison, one can try to do something similar taking \(P_0 \sim \mathrm{Binom}(10, 0.6)\).
FVDDP_ATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 10, 0.6),
density.f = function(x) dbinom(x, 10, 0.6), atomic = TRUE)
FVDDP_PAST_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2, 6, 5))
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2))
FVDDP_FUTURE_ATOMIC = propagate(fvddp = FVDDP_FUTURE_ATOMIC, delta.t = 0.4)
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_FUTURE_ATOMIC, y.new = c(6))
As before, the mixture referred to past observations is updated once, the one referred to future observations is updated twice.
FVDDP_SMOOTH_ATOMIC = smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4))
FVDDP_SMOOTH_ATOMIC
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 1 1 0 1
#> [2,] 1 1 0 2
#> [3,] 1 1 0 3
#> [4,] 1 1 1 1
#> [5,] 1 1 1 2
#> [6,] 1 1 1 3
#> [7,] 1 1 2 1
#> [8,] 1 1 2 2
#> [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.0001721823 0.0025036150 0.0094628016 0.0002320301 0.0024384330
#> [6] 0.0068287189 0.0004682311 0.0037328174 0.0075867038 0.0117827759
#> [11] 0.1266737735 0.3103950448 0.0112750878 0.0913256329 0.1717719912
#> [16] 0.0153587293 0.0979424891 0.1300489424
In this case, the mixture is clearly bigger. The reason is that when \(P_0\) is atomic, the set \(D^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\) does not put constraints based on the appearance of shared values across different times.
Past examples should also provide an insight on the main issue
related to propagate()
and smooth()
: the size
of the matrix \(M\) grows polynomially
with respect to the amount of collected data; moreover, it can be seen
that as the number of weights increases, many of them become almost
negligible.
In order to avoid long computations, it is possible to use
approx.propagate()
: it reproduces the propagation of the
signal by means of Monte Carlo method. The idea, proposed by Ascolani, Lijoi, and Ruggiero (2021), is to
mimic the evolution of the \(K\)-dimensional death process using a
one-dimensional one, and then extracting a multidimensional vector with
a multivariate hypergeometic distributions.
FVDDP =initialize(theta = 3, sampling.f= function(x) rnorm(x, -1, 3),
density.f = function(x) dnorm(x, -1, 3), atomic = FALSE)
FVDDP = update(fvddp = FVDDP, y.new = c(-1.145, 0.553, 0.553, 0.553))
In the previous chunk, a process with hyperparameters \(\theta = 3\) and \(P_0 \sim \mathcal{N}(-1, 3)\) was created
and updated. The syntax of the approximating functions is just the same
as in propagate()
, with the exceptions that one must
specify the number of samples N
to be drawn.
FVDDP_APPR_PROP
#> theta: 3
#>
#> P0.sample: function(x) rnorm(x, -1, 3)
#>
#> P0.density/mass: function(x) dnorm(x, -1, 3)
#>
#> is.atomic: FALSE
#>
#> Unique values (y.star):
#> [1] -1.145 0.553
#>
#> Multiplicities (M):
#> -1.145 0.553
#> [1,] 0 0
#> [2,] 0 1
#> [3,] 0 2
#> [4,] 0 3
#> [5,] 1 0
#> [6,] 1 1
#> [7,] 1 2
#> [8,] 1 3
#>
#> Weights (w):
#> [1] 0.13530 0.33260 0.17440 0.02000 0.10310 0.17210 0.05785 0.00465
The results is again an fvddp
object. In order to
measure the accuracy of such approximation, one has to compute the exact
output of the propagation, again with time \(t=0.45\).
Then one can measure the difference in the weights with
error.estimate()
. The arguments are
fvddp.exact
and fvddp.approx
, and the output
is a vector containing the difference among the weights, in absolute
value. The option remove.unmatched
allows to choose
whenever a vector is in the exact propagation but not in the
approximate: if TRUE
, the missing weight is assumed to be
\(0\), if FALSE
, this
comparison will not be reported in the output (which will result to be
shorter).
error.estimate(FVDDP_EXACT_PROP, FVDDP_APPR_PROP)
#> [1] 0.0058286503 0.0068326918 0.0028019247 0.0014386014 0.0008613986
#> [6] 0.0015530747 0.0001989751 0.0001334191
Something similar can be done for the smoothing via
approximate.smooth()
; in this case the Monte Carlo method
is necessary to support importance sampling, where the importances are
given by the right hand side of the formulae for \(w_{\mathbf{k}_{i-1}, \mathbf{n}_i,
\mathbf{k}_{i+1}}^{\mathbf{n}_{i-1}, \mathbf{n}_{i+1}}\). For
this reason, the result of the simulation may be less stable than in the
case of the propagation seen above, and a larger amount of samples will
be required to achieve a good accuracy.
In the following example, one can see how to copy wht was done in the exact smoothing.
FVDDP_SMOOTH_APPR = approx.smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4), N = 50000)
FVDDP_SMOOTH_APPR
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 1 1 0 1
#> [2,] 1 1 0 2
#> [3,] 1 1 0 3
#> [4,] 1 1 1 1
#> [5,] 1 1 1 2
#> [6,] 1 1 1 3
#> [7,] 1 1 2 1
#> [8,] 1 1 2 2
#> [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.0002318445 0.0032076427 0.0117206721 0.0002089701 0.0021740870
#> [6] 0.0057754325 0.0003174570 0.0025240938 0.0051318324 0.0161197332
#> [11] 0.1568137178 0.3993952863 0.0088537112 0.0776609913 0.1427394035
#> [16] 0.0102407155 0.0676997460 0.0891846631
error.estimate(FVDDP_SMOOTH_ATOMIC, FVDDP_SMOOTH_APPR)
#> [1] 5.966216e-05 7.040276e-04 2.257871e-03 2.305999e-05 2.643460e-04
#> [6] 1.053286e-03 1.507740e-04 1.208724e-03 2.454871e-03 4.336957e-03
#> [11] 3.013994e-02 8.900024e-02 2.421377e-03 1.366464e-02 2.903259e-02
#> [16] 5.118014e-03 3.024274e-02 4.086428e-02
Another tool to cut the computational cost of predictive of smoothing
inference is given by the prune()
function. It allows to
remove from the mixture all vectors \(\mathbf{m}\) whose weight \(w_\mathbf{m}\) is under some treshold \(\varepsilon\). Such eights are then
normalized such that their sum is \(1\).
In the example, the function will be applied to one of the processes prevously calculated, fixing \(\varepsilon = 10^{-2}\).
PRUNED
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 2 1 0 1
#> [2,] 2 1 0 2
#> [3,] 2 1 0 3
#> [4,] 2 1 1 1
#> [5,] 2 1 1 2
#> [6,] 2 1 1 3
#> [7,] 2 1 2 1
#> [8,] 2 1 2 2
#> [9,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.01219024 0.13105433 0.32112895 0.01166500 0.09448380 0.17771211 0.01588986
#> [8] 0.10132948 0.13454622
In this context, the treshold is insanely high; this is done for the unique purpose of showing how the function works; in the practical use of the package, a reasonable \(\varepsilon\) is between \(10^{-9}\) and the machine epsilon of the computer in use.
The last task that it can be performed is sampling values from Fleming-Viot dependent Dirichlet Processes. This can be done by simply choosing a vector \(w_\mathbf{m}\) and drawing a value from \(P_0\) with probability \(\frac{\theta}{\theta + |\mathbf{m}|}\), or choosing \(y_m^\ast\) with probability \(\frac{m_j}{\theta + |\mathbf{m}|}\). To get a sample of size \(N\), it is sufficient to replicate this mechanism \(n\) times.
The implementation is named posterior.sample()
. Its
arguments are the signal and the number N
of values to
draw.
table(round(y, 3))
#>
#> -8.67 -7.232 -7.197 -6.912 -6.764 -5.968 -5.659 -5.249 -5.19 -5.114 -5.105
#> 1 1 1 1 1 1 1 1 1 1 1
#> -4.826 -4.375 -4.326 -4.21 -4.073 -3.836 -3.728 -3.562 -3.428 -2.793 -2.746
#> 1 1 1 1 1 1 1 1 1 1 1
#> -2.304 -2.162 -2.04 -1.687 -1.624 -1.555 -1.447 -1.286 -1.21 -1.145 -1.07
#> 1 1 1 1 1 1 1 1 1 3 1
#> -0.965 -0.706 -0.656 -0.561 -0.465 -0.462 -0.298 -0.256 -0.134 -0.059 -0.032
#> 1 1 1 1 1 1 1 1 1 1 1
#> 0.128 0.208 0.369 0.384 0.444 0.553 0.652 0.817 0.95 1.014 1.036
#> 1 1 1 1 1 26 1 1 1 1 1
#> 1.092 1.367 1.442 1.535 1.603 2.016 2.233 2.29 2.341 2.447 2.766
#> 1 1 1 1 1 1 1 1 1 1 1
#> 2.959 3.393 3.893 4.828 4.854 5.744 5.833
#> 1 1 1 1 1 1 1
The command table()
was used here to display more
efficiently how many times each value has been sampled.
In the Bayesian Nonparametric framework, scientists prefer to use the predictive structure of the Dirichlet process when they want to picture how future observations will be like. This choice is strongly related to the exchangeability assumption underlying the model (more in Ghosal and Vaart (2017)); in this context, however, it is sufficient to say that predictive structure is nothing but the sequential use of posterior sampling and update. In fact, a value is repeatedly drawn from the mixture and it is incorporated within each vector \(\mathbf{m}\in M\) via an update. A full description of this mechanism was developed by Ascolani, Lijoi, and Ruggiero (2021).
This is implemented efficiently via predictive.struct()
;
the arguments are the same as in posterior.sample()
.