Mathematical Definitions of Sufficient Statistics

Overview

The redeem package models the intensities of interaction formation and dissolution using a log-linear formulation. The intensity (rate) for a dyad \((i,j)\) at time \(t\) is:

\[\lambda_{i,j}(t) = \exp(s_{i,j}(\mathscr{H}_t)^\top \beta + \alpha_i + \alpha_j + f(t, \gamma))\]

where:

\(s_{i,j}(\mathscr{H}_t)\) is a vector of sufficient statistics derived from the history of interactions \(\mathscr{H}_t\) up to time \(t\).
\(\alpha_i, \alpha_j\) are actor popularity parameters.
\(f(t, \gamma) = \sum_{q=1}^Q \gamma_q \mathbb{I}(c_{q-1} \le t < c_q)\) is the baseline step-function representing temporal variation over change points \(0 = c_0 < c_1 < \dots < c_Q\) (with \(\gamma_1 = 0\)).

This vignette provides precise mathematical definitions for all sufficient network statistics implemented in the package, as well as a guide for adding custom statistics.

The complete list of available formula terms is documented in the redeem_terms reference manual page. For model estimation, see the help pages for the models via dem and rem.

Statistic Transformations

Each sufficient statistic \(s_{i,j}(\mathscr{H}_t)\) is defined as a transformed count \(f(N(t))\), where \(N(t)\) is a raw network statistic. The package supports five standard transformations \(f(\cdot)\):

Transformation	Mathematical Definition
`identity`	\(f(s) = s\)
`log`	\(f(s) = \log(1 + s)\)
`recip`	\(f(s) = 1/(1+s)\)
`bin`	\(f(s) = \mathbb{I}(s > 0)\)
`sig`	\(f(s) = \frac{s}{s + K}\)

Endogenous Network Statistics

These statistics capture structural dependencies within the network evolution.

1. Intercept (`Intercept` / `intercept`)

A constant term representing the baseline log-intensity. \[s_{i,j}(\mathscr{H}_t) = 1\]

2. Inertia (`inertia` / `number_interaction`)

Counts how many times the dyad \((i,j)\) has initiated an interaction in the past. \[s_{i,j}(\mathscr{H}_t) = f(N_{i,j}(t))\] where \(N_{i,j}(t) = \sum_{k: t_k < t} \mathbb{I}(i_k = i, j_k = j)\) (or the windowed version \(N_{i,j}^w(t) = \sum_{k: t-w < t_k < t} \mathbb{I}(i_k = i, j_k = j)\)).

3. Reciprocity (`reciprocity`)

Models the tendency to reciprocate past interactions (directed only). \[s_{i,j}(\mathscr{H}_t) = f(N_{j,i}(t))\]

4. Duration (`duration` / `current_interaction`)

Measures the dependency on the time since the current interaction started (DEM only). \[s_{i,j}(\mathscr{H}_t) = \begin{cases} f(t - t_{\text{start, } i,j}) & \text{if dyad } (i,j) \text{ is interacting at } t \\ 0 & \text{otherwise} \end{cases}\] where \(t_{\text{start, } i,j}\) is the timestamp of the last formation event.

5. Participation Shifts (P-shifts)

P-shifts capture the sequential dependencies between consecutive events (REM only). Let the preceding event in the stream be \(A \to B\). For a candidate event \(i \to j\) at time \(t\):

psABBA (Reciprocation): \(s_{i,j}(t) = \mathbb{I}(i = B, j = A)\)
psABBY (Receiver turn-continuing): \(s_{i,j}(t) = \mathbb{I}(i = B, j \ne A, j \ne B)\)
psABAY (Sender turn-continuing): \(s_{i,j}(t) = \mathbb{I}(i = A, j \ne A, j \ne B)\)
psABXA (Usurpation to sender): \(s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j = A)\)
psABXB (Usurpation to receiver): \(s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j = B)\)
psABXY (Completely new dyad): \(s_{i,j}(t) = \mathbb{I}(i \ne A, i \ne B, j \ne A, j \ne B)\)

Triadic Closure and Shared Partners

Triadic statistics capture structural closure. They can be calculated over active edges (designated as current_ in DEM) or historical event existence (designated as general_ in REM/DEM). Let \(\mathcal{A}_t\) represent the set of interacting dyads (for current_) or previously interacted dyads (for general_) at time \(t\).

Common Partners (`general_common_partners` / `current_common_partners`)

Counts the number of third-party actors \(k\) sharing a connection of a specified type with both \(i\) and \(j\):

OSP (Outgoing Shared Partner): \(s_{i,j}(t) = f(|\{k : (i,k) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\)
ISP (Incoming Shared Partner): \(s_{i,j}(t) = f(|\{k : (k,i) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\)
OTP (Outgoing Two-Path): \(s_{i,j}(t) = f(|\{k : (i,k) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\)
ITP (Incoming Two-Path): \(s_{i,j}(t) = f(|\{k : (k,i) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\)

Triangles (`general_triangle` / `current_triangle`)

Similar to common partners, but only non-zero if the focal dyad itself is active (directed only):

OSP Triangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (i,k) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\)
ISP Triangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (k,i) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\)
OTP Triangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (i,k) \in \mathcal{A}_t \land (k,j) \in \mathcal{A}_t\}|)\)
ITP Triangle: \(s_{i,j}(t) = \mathbb{I}((i,j) \in \mathcal{A}_t) \times f(|\{k : (k,i) \in \mathcal{A}_t \land (j,k) \in \mathcal{A}_t\}|)\)

Degree and Centrality Statistics

These statistics capture actor-level activity or popularity. Let \(\mathcal{D}_t\) be the network state at time \(t\), and \(d_{i, \text{out}}(t)\), \(d_{i, \text{in}}(t)\) represent the out-degree and in-degree of \(i\) in \(\mathcal{D}_t\). Let \(c_{i, \text{out}}(t)\), \(c_{i, \text{in}}(t)\) be the total out-events and in-events involving \(i\).

Degree Statistics (`general_degree_out_sender`, etc.)

Out-Degree Sender: \(s_{i,j}(t) = f(d_{i, \text{out}}(t))\)
Out-Degree Receiver: \(s_{i,j}(t) = f(d_{j, \text{out}}(t))\)
In-Degree Sender: \(s_{i,j}(t) = f(d_{i, \text{in}}(t))\)
In-Degree Receiver: \(s_{i,j}(t) = f(d_{j, \text{in}}(t))\)
Degree Sum (undirected only): \(s_{i,j}(t) = f(d_{i}(t) + d_{j}(t))\)
Degree Absolute Difference (undirected only): \(s_{i,j}(t) = f(\|d_{i}(t) - d_{j}(t)\|)\)

Count Statistics (`general_count_out_sender`, etc.)

Count statistics are identical to degree statistics but use total interaction counts (\(c\)) rather than binary degrees (\(d\)):

Out-Count Sender: \(s_{i,j}(t) = f(c_{i, \text{out}}(t))\)
Out-Count Receiver: \(s_{i,j}(t) = f(c_{j, \text{out}}(t))\)
In-Count Sender: \(s_{i,j}(t) = f(c_{i, \text{in}}(t))\)
In-Count Receiver: \(s_{i,j}(t) = f(c_{j, \text{in}}(t))\)
Count Sum (undirected only): \(s_{i,j}(t) = f(c_{i}(t) + c_{j}(t))\)
Count Absolute Difference (undirected only): \(s_{i,j}(t) = f(\|c_{i}(t) - c_{j}(t)\|)\)

All of these degree and count statistics can be conveniently specified in model formulas using the degree() (or degrees()) and count() helper functions:

Out-Degree / Out-Count Sender: degree(type = "out_sender") or count(type = "out_sender")
Out-Degree / Out-Count Receiver: degree(type = "out_receiver") or count(type = "out_receiver")
In-Degree / In-Count Sender: degree(type = "in_sender") or count(type = "in_sender")
In-Degree / In-Count Receiver: degree(type = "in_receiver") or count(type = "in_receiver")
Degree / Count Sum: degree(type = "sum") or count(type = "sum")
Degree / Count Absolute Difference: degree(type = "absdiff") or count(type = "absdiff")

Exogenous Statistics

1. Dyadic Covariate (`dyadic_cov`)

\[s_{i,j}(\mathscr{H}_t) = f(X_{i,j}(t))\] where \(X(t)\) is an \(N \times N\) matrix.

2. Monadic Covariate (`monadic_cov`)

\[s_{i,j}(\mathscr{H}_t) = g(x_i(t), x_j(t))\] where \(x\) is a monadic vector and \(g(\cdot)\) is a user-defined function.

Developer Guide: Adding Custom Sufficient Statistics

To add a new sufficient statistic (e.g., my_stat) to the package, follow this two-step process:

Step 1: Implement the Update Logic in C++

Define a C++ function in the src/ directory (e.g., in a new file or in src/sufficient_statistics.cpp) with the signature ValidateFunction. This function computes the change to the statistic state matrix when an event occurs:

#include "redeem/sufficient_statistics.h"
#include "redeem/extension_api.hpp"

arma::uvec stat_my_stat(
    Data_DEM &object, 
    arma::mat &data, 
    unsigned int &from, 
    unsigned int &to, 
    unsigned int &number_event, 
    unsigned int col_number, 
    std::string transformation, 
    unsigned int type
) {
  if (from == 0 || to == 0) return arma::uvec();
  
  // 1. Calculate which dyad indices are affected and the change value (val)
  double val = (type == 1) ? 1.0 : -1.0; 
  arma::uvec affected_indices = { object.current_stats.find_from_to(from, to) };
  
  // 2. Apply the update using the helper (handles transformations internally)
  apply_update(object, affected_indices, col_number, val, transformation, data);
  
  // 3. Return the indices of modified dyads
  return affected_indices;
}

// 4. Register the term in the global C++ Registry
TERM_REGISTER("my_stat", stat_my_stat);

Step 2: Write the R Initializer Function

In R/init_terms.R, create an initialization function named InitRedeemTerm.my_stat. This function validates user-provided arguments and sets up initial values:

#' @keywords internal
InitRedeemTerm.my_stat <- function(arglist, n_nodes, model_type, directed, ...) {
  # Validate arguments against expected types and models
  arglist <- check.RedeemTerm(arglist,
    model_type = model_type, directed = directed,
    expected = list(transformation = c("identity", "log", "recip", "bin", "sig"), K = "numeric"),
    defaults = list(transformation = "identity", K = 1)
  )
  
  # Define initial statistic values at time 0 (optional)
  eval_at_zero <- matrix(0, n_nodes, n_nodes)
  
  # Return the configuration list
  list(
    base_name = "my_stat",
    transformation = arglist$transformation,
    eval_at_zero = eval_at_zero,
    event_stream = arglist$event_stream
  )
}

Once these steps are completed, run devtools::document() to update namespaces and documentation, and build the package. The new term is now ready for use in dem() and rem() by specifying it directly in the formula (e.g., ~ my_stat(transformation = "log")).

Mathematical Definitions of Sufficient Statistics

Overview

Statistic Transformations

Endogenous Network Statistics

1. Intercept (Intercept / intercept)

2. Inertia (inertia / number_interaction)

3. Reciprocity (reciprocity)

4. Duration (duration / current_interaction)