---
title: "Mathematical Definitions of Sufficient Statistics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Mathematical Definitions of Sufficient Statistics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

The **redeem** package models the intensities of interaction formation and
dissolution using a log-linear formulation. The intensity (rate) for a dyad
$(i,j)$ at time $t$ is:

$$\lambda_{i,j}(t) = \exp(s_{i,j}(\mathscr{H}_t)^\top \beta + \alpha_i + \alpha_j + f(t, \gamma))$$

where:

- $s_{i,j}(\mathscr{H}_t)$ is a vector of **sufficient statistics** derived from the
  history of interactions $\mathscr{H}_t$ up to time $t$.
- $\alpha_i, \alpha_j$ are actor popularity parameters.
- $f(t, \gamma) = \sum_{q=1}^Q \gamma_q \mathbb{I}(c_{q-1} \le t < c_q)$ is the baseline
  step-function representing temporal variation over change points
  $0 = c_0 < c_1 < \dots < c_Q$ (with $\gamma_1 = 0$).

This vignette provides precise mathematical definitions for all sufficient
network statistics implemented in the package, as well as a guide for adding
custom statistics.

The complete list of available formula terms is documented in the
`redeem_terms` reference manual page. For model estimation, see
the help pages for the models via `dem` and `rem`.

---

## Statistic Transformations

Each sufficient statistic $s_{i,j}(\mathscr{H}_t)$ is defined as a transformed count
$f(N(t))$, where $N(t)$ is a raw network statistic. The package supports five
standard transformations $f(\cdot)$:

| Transformation | Mathematical Definition |
| :--- | :--- |
| **`identity`** | $f(s) = s$ |
| **`log`** | $f(s) = \log(1 + s)$ |
| **`recip`** | $f(s) = 1/(1+s)$ |
| **`bin`** | $f(s) = \mathbb{I}(s > 0)$ |
| **`sig`** | $f(s) = \frac{s}{s + K}$ |


---

## Endogenous Network Statistics

These statistics capture structural dependencies within the network evolution.

### 1. Intercept (`Intercept` / `intercept`)
A constant term representing the baseline log-intensity.
$$s_{i,j}(\mathscr{H}_t) = 1$$

### 2. Inertia (`inertia` / `number_interaction`)
Counts how many times the dyad $(i,j)$ has initiated an interaction in the past.
$$s_{i,j}(\mathscr{H}_t) = f(N_{i,j}(t))$$
where $N_{i,j}(t) = \sum_{k: t_k < t} \mathbb{I}(i_k = i, j_k = j)$ (or the windowed
version $N_{i,j}^w(t) = \sum_{k: t-w < t_k < t} \mathbb{I}(i_k = i, j_k = j)$).

### 3. Reciprocity (`reciprocity`)
Models the tendency to reciprocate past interactions (directed only).
$$s_{i,j}(\mathscr{H}_t) = f(N_{j,i}(t))$$

### 4. Duration (`duration` / `current_interaction`)
Measures the dependency on the time since the current interaction started (DEM only).
$$s_{i,j}(\mathscr{H}_t) = \begin{cases} f(t - t_{\text{start, } i,j}) & \text{if dyad } (i,j) \text{ is interacting at } t \\ 0 & \text{otherwise} \end{cases}$$
where $t_{\text{start, } i,j}$ is the timestamp of the last formation event.

### 5. Participation Shifts (P-shifts)
P-shifts capture the sequential dependencies between consecutive events (REM only).
Let the preceding event in the stream be $A \to B$. For a candidate event
$i \to j$ at time $t$:

- **`psABBA`** (Reciprocation): $s_{i,j}(t) = \mathbb{I}(i = B, j = A)$
- **`psABBY`** (Receiver turn-continuing): $s_{i,j}(t) = \mathbb{I}(i = B, j \ne A, j \ne B)$
- **`psABAY`** (Sender turn-continuing): $s_{i,j}(t) = \mathbb{I}(i = A, j \ne A, j \ne B)$
- **`psABXA`** (Usurpation to sender): $s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j = A)$
- **`psABXB`** (Usurpation to receiver): $s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j = B)$
- **`psABXY`** (Completely new dyad): $s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j \ne A, j \ne B)$

---

## Triadic Closure and Shared Partners

Triadic statistics capture structural closure. They can be calculated over
active edges (designated as **`current_`** in DEM) or historical event existence
(designated as **`general_`** in REM/DEM). Let $\mathcal{A}_t$ represent the set of
interacting dyads (for `current_`) or previously interacted dyads (for
`general_`) at time $t$.

### Common Partners (`general_common_partners` / `current_common_partners`)
Counts the number of third-party actors $k$ sharing a connection of a specified
type with both $i$ and $j$:

- **`OSP`** (Outgoing Shared Partner): $s_{i,j}(t) = f(|\{k : (i,k) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)$
- **`ISP`** (Incoming Shared Partner): $s_{i,j}(t) = f(|\{k : (k,i) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)$
- **`OTP`** (Outgoing Two-Path): $s_{i,j}(t) = f(|\{k : (i,k) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)$
- **`ITP`** (Incoming Two-Path): $s_{i,j}(t) = f(|\{k : (k,i) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)$

### Triangles (`general_triangle` / `current_triangle`)
Similar to common partners, but only non-zero if the focal dyad itself is active (directed only):

- **`OSP`** Triangle: $s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (i,k) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)$
- **`ISP`** Triangle: $s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (k,i) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)$
- **`OTP`** Triangle: $s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (i,k) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)$
- **`ITP`** Triangle: $s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (k,i) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)$

---

## Degree and Centrality Statistics

These statistics capture actor-level activity or popularity. Let $\mathcal{D}_t$ be the
network state at time $t$, and $d_{i, \text{out}}(t)$, $d_{i, \text{in}}(t)$ represent the
out-degree and in-degree of $i$ in $\mathcal{D}_t$. Let $c_{i, \text{out}}(t)$,
$c_{i, \text{in}}(t)$ be the total out-events and in-events involving $i$.

### Degree Statistics (`general_degree_out_sender`, etc.)

- **Out-Degree Sender**: $s_{i,j}(t) = f(d_{i, \text{out}}(t))$
- **Out-Degree Receiver**: $s_{i,j}(t) = f(d_{j, \text{out}}(t))$
- **In-Degree Sender**: $s_{i,j}(t) = f(d_{i, \text{in}}(t))$
- **In-Degree Receiver**: $s_{i,j}(t) = f(d_{j, \text{in}}(t))$
- **Degree Sum** (undirected only): $s_{i,j}(t) = f(d_{i}(t) + d_{j}(t))$
- **Degree Absolute Difference** (undirected only): $s_{i,j}(t) = f(\|d_{i}(t) - d_{j}(t)\|)$

### Count Statistics (`general_count_out_sender`, etc.)

Count statistics are identical to degree statistics but use total interaction
counts ($c$) rather than binary degrees ($d$):

- **Out-Count Sender**: $s_{i,j}(t) = f(c_{i, \text{out}}(t))$
- **Out-Count Receiver**: $s_{i,j}(t) = f(c_{j, \text{out}}(t))$
- **In-Count Sender**: $s_{i,j}(t) = f(c_{i, \text{in}}(t))$
- **In-Count Receiver**: $s_{i,j}(t) = f(c_{j, \text{in}}(t))$
- **Count Sum** (undirected only): $s_{i,j}(t) = f(c_{i}(t) + c_{j}(t))$
- **Count Absolute Difference** (undirected only): $s_{i,j}(t) = f(\|c_{i}(t) - c_{j}(t)\|)$

All of these degree and count statistics can be conveniently specified in model
formulas using the `degree()` (or `degrees()`) and `count()` helper functions:

- **Out-Degree / Out-Count Sender**: `degree(type = "out_sender")` or
  `count(type = "out_sender")`
- **Out-Degree / Out-Count Receiver**: `degree(type = "out_receiver")` or
  `count(type = "out_receiver")`
- **In-Degree / In-Count Sender**: `degree(type = "in_sender")` or
  `count(type = "in_sender")`
- **In-Degree / In-Count Receiver**: `degree(type = "in_receiver")` or
  `count(type = "in_receiver")`
- **Degree / Count Sum**: `degree(type = "sum")` or `count(type = "sum")`
- **Degree / Count Absolute Difference**: `degree(type = "absdiff")` or
  `count(type = "absdiff")`

---

## Exogenous Statistics

### 1. Dyadic Covariate (`dyadic_cov`)
$$s_{i,j}(\mathscr{H}_t) = f(X_{i,j}(t))$$
where $X(t)$ is an $N \times N$ matrix.

### 2. Monadic Covariate (`monadic_cov`)
$$s_{i,j}(\mathscr{H}_t) = g(x_i(t), x_j(t))$$
where $x$ is a monadic vector and $g(\cdot)$ is a user-defined function.

---

## Developer Guide: Adding Custom Sufficient Statistics

To add a new sufficient statistic (e.g., `my_stat`) to the package, follow this
two-step process:

### Step 1: Implement the Update Logic in C++
Define a C++ function in the `src/` directory (e.g., in a new file or in
`src/sufficient_statistics.cpp`) with the signature `ValidateFunction`.
This function computes the change to the statistic state matrix when an
event occurs:

```cpp
#include "redeem/sufficient_statistics.h"
#include "redeem/extension_api.hpp"

arma::uvec stat_my_stat(
    Data_DEM &object, 
    arma::mat &data, 
    unsigned int &from, 
    unsigned int &to, 
    unsigned int &number_event, 
    unsigned int col_number, 
    std::string transformation, 
    unsigned int type
) {
  if (from == 0 || to == 0) return arma::uvec();
  
  // 1. Calculate which dyad indices are affected and the change value (val)
  double val = (type == 1) ? 1.0 : -1.0; 
  arma::uvec affected_indices = { object.current_stats.find_from_to(from, to) };
  
  // 2. Apply the update using the helper (handles transformations internally)
  apply_update(object, affected_indices, col_number, val, transformation, data);
  
  // 3. Return the indices of modified dyads
  return affected_indices;
}

// 4. Register the term in the global C++ Registry
TERM_REGISTER("my_stat", stat_my_stat);
```

### Step 2: Write the R Initializer Function
In `R/init_terms.R`, create an initialization function named
`InitRedeemTerm.my_stat`. This function validates user-provided arguments and
sets up initial values:

```r
#' @keywords internal
InitRedeemTerm.my_stat <- function(arglist, n_nodes, model_type, directed, ...) {
  # Validate arguments against expected types and models
  arglist <- check.RedeemTerm(arglist,
    model_type = model_type, directed = directed,
    expected = list(transformation = c("identity", "log", "recip", "bin", "sig"), K = "numeric"),
    defaults = list(transformation = "identity", K = 1)
  )
  
  # Define initial statistic values at time 0 (optional)
  eval_at_zero <- matrix(0, n_nodes, n_nodes)
  
  # Return the configuration list
  list(
    base_name = "my_stat",
    transformation = arglist$transformation,
    eval_at_zero = eval_at_zero,
    event_stream = arglist$event_stream
  )
}
```

Once these steps are completed, run `devtools::document()` to update namespaces
and documentation, and build the package. The new term is now ready for use
in `dem()` and `rem()` by specifying it directly in the formula
(e.g., `~ my_stat(transformation = "log")`).
