FBMS-Flexible Bayesian Model Selection and Model Averaging

This vignette introduces the FBMS package and shows how to perform Flexible Bayesian Model Selection (BMS) and Bayesian Model Averaging (BMA) for linear, generalized, nonlinear, fractional–polynomial, mixed–effect, and logic–regression models. More details are provided in the paper: “FBMS: An R Package for Flexible Bayesian Model Selection and Model Averaging” available on arxiv: more explicit examples accompanying the paper can be found on github https://github.com/jonlachmann/GMJMCMC/tree/data-inputs/tests_current


Setup

Load technical markdown settings and a custom function for printing only what is needed through printn().


Bayesian Model Selection and Averaging

  1. Consider a class of models: \(\Omega: m_1(Y|X,\theta_1), \dots, m_k(Y|X,\theta_k)\).
  2. Assign priors to models \(P(m_1), \dots, P(m_k)\) and parameters \(P(\theta_j|m_j)\).
  3. Obtain joint posterior \(P(m_j,\theta_j|D)\).
  4. Make inference on a quantity of interest \(\Delta\).
Show equations

\[ P(\Delta|D) = \sum_{m\in\Omega} P(m|D)\, \int_{\Theta_m} P(\Delta|m,\theta,D)\, P(\theta|m,D)\, d\theta. \]

Bayesian Generalized Nonlinear Model (BGNLM)

Reference: [@hubin2020flexible]

Show equations

\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y \mid \mu_i,\phi), \quad i = 1,\dots,n,\\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{q} \gamma_j \beta_j F_j(\mathbf{x}_i, \boldsymbol{\alpha}_j) + \sum_{k=1}^{r} \delta_k. \end{aligned} \]

Feature space

Depending on allowed nonlinear functions \(\mathbb{G}\): neural nets (sigmoid), decision trees (thresholds), MARS (hinges), fractional polynomials, logic regression, etc.

Transformations available in FBMS

Name Function Name Function
sigmoid \(1 / (1 + \exp(-x))\) sqroot \(|x|^{1/2}\)
relu \(\max(x, 0)\) troot \(|x|^{1/3}\)
nrelu \(\max(-x, 0)\) sin_deg \(\sin(x/180 \cdot \pi)\)
hs \(x > 0\) cos_deg \(\cos(x/180 \cdot \pi)\)
nhs \(x < 0\) exp_dbl \(\exp(-|x|)\)
gelu \(x \Phi(x)\) gauss \(\exp(-x^2)\)
ngelu \(-x \Phi(-x)\) erf \(2 \Phi(\sqrt{2}x) - 1\)
pm2 \(x^{-2}\) p0pm2 \(p0(x) \cdot x^{-2}\)
pm1 \(\text{sign}(x) |x|^{-1}\) p0pm05 \(p0(x) \cdot |x|^{-0.5}\)
pm05 \(|x|^{-0.5}\) p0p0 \(p0(x)^2\)
p05 \(|x|^{0.5}\) p0p05 \(p0(x) \cdot |x|^{0.5}\)
p2 \(x^2\) p0p1 \(p0(x) \cdot x\)
p3 \(x^3\) p0p2 \(p0(x) \cdot x^2\)
p0p3 \(p0(x) \cdot x^3\)

Custom functions can be added to \(\mathbb{G}\).


Priors

Model priors

Let \(M = (\gamma_1, \dots, \gamma_q)\) (for linear models \(q=p\)).

Penalizing complexity priors

Show equation

\[ P(M) \propto \mathbb{I}(|\boldsymbol\gamma_{1:q}| \le Q) \prod_{j=1}^q r^{\gamma_j c(F_j(\mathbf{x}, \boldsymbol{\alpha}_j))}, \quad 0 < r < 1, \]

  • \(c(F_j(\cdot))\): complexity measure (linear models: \(c(x)=1\); BGNLM counts algebraic operators).

The following parameter will be used to change the prior penalization.

# model prior complexity penalty 
model_prior = list(r = 0.005) #default is 1/n

Parameter priors

Mixtures of g-priors

Show equations

\[ \begin{aligned} P(\beta_0, \phi \mid M) &\propto \phi^{-1}, \\ P(\boldsymbol{\beta} \mid g) &\sim \mathbb{N}_{|M|}\!\left(\mathbf{0},\, g \cdot \phi\, J_n(\boldsymbol{\beta})^{-1}\right), \\ \frac{1}{1+g} &\sim \text{tCCH}\!\left(\frac{a}{2}, \frac{b}{2}, \rho, \frac{s}{2}, v, \kappa\right). \end{aligned} \]

  • Robust-g (convenient default): \(a=1, b=2, \rho=1.5, s=0, v=\frac{n+1}{|M|+1}, \kappa=1\).

Jeffreys prior

Show equations

\[ P(\phi \mid M) = \phi^{-1}, \quad P(\beta_0, \boldsymbol{\beta} \mid M) = |J_n(\beta_0, \boldsymbol{\beta})|^{1/2}. \]

All priors from the table below (default is the g-prior)

Parameter priors available (and where to tune)

Prior (Alias) \(a\) \(b\) \(\rho\) \(s\) \(v\) \(k\) Families
Default:
g-prior \(g\) (default: \(\max(n, p^2)\)) GLM
tCCH-Related Priors:
CH \(a\) \(b\) 0 \(s\) 1 1 GLM
uniform 2 2 0 0 1 1 GLM
Jeffreys 0 2 0 0 1 1 GLM
beta.prime \(1/2\) \(n - p_M - 1.5\) 0 0 1 1 GLM
benchmark 0.02 \(0.02 \max(n, p^2)\) 0 0 1 1 GLM
TG \(2a\) 2 0 \(2s\) 1 1 GLM
ZS-adapted 1 2 0 \(n + 3\) 1 1 GLM
robust 1 2 1.5 0 \((n+1)/(p_M+1)\) 1 GLM
hyper-g-n 1 2 1.5 0 1 \(1/n\) GLM
intrinsic 1 1 1 0 \((n + p_M + 1)/(p_M + 1)\) \((n + p_M + 1)/n\) GLM
tCCH \(a\) \(b\) \(\rho\) \(s\) \(v\) \(k\) GLM
Other Priors:
EB-local \(a\) GLM
EB-global \(a\) G
JZS \(a\) G
ZS-null \(a\) G
ZS-full \(a\) G
hyper-g \(a\) GLM
hyper-g-laplace \(a\) G
AIC None GLM
BIC None GLM
Jeffreys-BIC Var GLM

Here \(p_M\) is the number of predictors excluding the intercept. “G” denotes Gaussian-only; “GLM” additionally includes binomial, Poisson, and gamma.

How to switch priors in code (be explicit):

# g-prior with g = 1000
beta_prior = list(type = "g-prior", alpha = 1000)

# Robust prior (tune by Table parameters)
beta_prior = list(type = "robust")

# Jeffreys-BIC
beta_prior = list(type = "Jeffreys-BIC")

# Generic tCCH (provide all hyperparameters explicitly)
beta_prior = list(type = "tCCH", a = 2, b = 2, rho = 0, s = 0, v = 1, k = 1)

Inference algorithms

Model posterior

Show equations

Marginal likelihood \[ P(D|M) = \int_{\Theta_M} P(D|\theta_M, M) \, P(\theta_M|M) \, d\theta_M. \]

Posterior \[ P(M|D) = \frac{P(D|M) P(M)}{\sum_{M' \in \Omega} P(D|M') P(M')}. \]

Approximation over discovered models \(\Omega^*\) \[ P(M|D) \approx \frac{P(D|M) P(M)}{\sum_{M' \in \Omega^*} P(D|M') P(M')}, \quad M \in \Omega^*. \]

Marginal inclusion probability \[ P(\gamma_j=1|D) \approx \sum_{M \in \Omega^*: \gamma_j=1} P(M|D). \]

MCMC, MJMCMC, GMJMCMC

Parallelization

Run multiple chains and aggregate unique models \(\Omega^*\):

Show equation

\[ \widehat{P}(\Delta|D) = \sum_{M \in \Omega^*} P(\Delta|M,D) \, \widehat{P}(M|D). \]

# Parameters for parallel runs are set to a single thread and single core to comply with CRAN requirenments (please tune for your machine if you have more capacity)
runs  <- 1  # 1 set for simplicity; use rather 16 or more
cores <- 1  # 1 set for simplicity; use rather 8 or more

Example 1 — BGNLM: recovering Kepler’s third law

Data: \(n=939\) exoplanets; variables include semimajoraxis, period, hoststar_mass, etc.
Target relationship: \({a \approx K_2 \left(P^2 M_h\right)^{1/3}}\).

We shall run a single chain GMJMCMC

# Load example
data <- FBMS::exoplanet

# Choose a small but expressive transform set for a quick demo
transforms <- c("sigmoid", "sin_deg", "exp_dbl", "p0", "troot", "p3")

# ---- fbms() call (simple GMJMCMC) ----
# Key parameters (explicit):
# - formula      : semimajoraxis ~ 1 + .       # response and all predictors
# - data         : data                        # dataset
# - beta_prior   : list(type = "g-prior")      # parameter prior    
# - model_prior  : list(r = 1/dim(data)[1])    # model prior   
# - method       : "gmjmcmc"                   # exploration strategy
# - transforms   : transforms                  # nonlinear feature dictionary
# - P            : population size per generation (search breadth)
result_single <- fbms(
  formula     = semimajoraxis ~ 1 + .,
  data        = data,
  beta_prior  = list(type = "g-prior", alpha = dim(data)[1]),
  model_prior = list(r = 1/dim(data)[1]),
  method      = "gmjmcmc",
  transforms  = transforms,
  P           = 10
)

# Summarize
printn(summary(result_single))
## 
## Best   population: 7  log marginal posterior: 3001.64 
## 
##                            feats.strings marg.probs
## 1          troot((period*sigmoid(mass))) 1.00000000
## 2        ((period*hoststar_mass)*period) 1.00000000
## 3             p3((period*hoststar_mass)) 1.00000000
## 4          (troot(period)*troot(period)) 1.00000000
## 5                 (period*hoststar_mass) 1.00000000
## 6 (sigmoid(mass)*(period*hoststar_mass)) 1.00000000
## 7                          hoststar_mass 0.91282456
## 8                   hoststar_temperature 0.01243301

and a parallel GMJMCMC

# ---- fbms() call (parallel GMJMCMC) ----
# Key parameters (explicit):
# - formula      : semimajoraxis ~ 1 + .       # response and all predictors
# - data         : data                        # dataset
# - beta_prior   : list(type = "g-prior")      # parameter prior   
# - model_prior  : list(r = 1/dim(data)[1])    # model prior   
# - method       : "gmjmcmc"                   # exploration strategy
# - transforms   : transforms                  # nonlinear feature dictionary
# - runs, cores  : parallelization controls
# - P            : population size per generation (search breadth)
result_parallel <- fbms(
  formula     = semimajoraxis ~ 1 + .,
  data        = data,
  beta_prior  = list(type = "g-prior", alpha = dim(data)[1]),
  model_prior = list(r = 1/dim(data)[1]),
  method      = "gmjmcmc.parallel",
  transforms  = transforms,
  runs        = runs,     # by default the rmd has runs =  1; increase for convergence
  cores       = cores,    # by default the rmd has cores = 1; increase for convergence
  P           = 10
)

# Summarize
printn(summary(result_parallel))
## 
## Best   population: 10  thread: 1  log marginal posterior: 2144.064 
## 
##                                              feats.strings marg.probs
## 1                                                   period  1.0000000
## 2                                            (period*mass)  1.0000000
## 3                            ((period*mass)*(period*mass))  1.0000000
## 4                                          sigmoid(period)  1.0000000
## 5                                               p0(period)  1.0000000
## 6                 ((period*mass)*(radius*(period*period)))  0.9999902
## 7                                 (radius*(period*period))  0.9999844
## 8 (hoststar_temperature*p0(((period*mass)*(period*mass))))  0.9945983
## 9                                             eccentricity  0.9554446

Plot output

plot(result_parallel)

Convergence plots

diagn_plot(result_parallel)


Example 2 — Bayesian linear models

Reference: [@hubin2018mode]

Model

\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y\mid \mu_i, \phi), \quad i=1,\dots,n, \\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{p} \gamma_j \beta_j x_{ij}. \end{aligned} \]

We simulate data with a known sparse truth and run MJMCMC with an explicit g-prior.

library(mvtnorm)

n <- 100               # sample size
p <- 20                # number of covariates
k <- 5                 # size of true submodel

correct.model <- 1:k
beta.k <- (1:5)/5

beta <- rep(0, p)
beta[correct.model] <- beta.k

set.seed(123)
x <- rmvnorm(n, rep(0, p))
y <- x %*% beta + rnorm(n)

# Standardize
y <- scale(y)
X <- scale(x) / sqrt(n)

df <- as.data.frame(cbind(y, X))
colnames(df) <- c("Y", paste0("X", seq_len(ncol(df) - 1)))

printn(correct.model)
## [1] 1 2 3 4 5
printn(beta.k)
## [1] 0.2 0.4 0.6 0.8 1.0

Run MJMCMC with a g-prior (g = 100)

# ---- fbms() call (MJMCMC) ----
# Explicit prior choice:
#   beta_prior = list(type = "g-prior", alpha = 100)
# To switch to another prior, e.g. robust:
#   beta_prior = list(type = "robust")
result.lin <- fbms(
  formula    = Y ~ 1 + .,
  data       = df,
  method     = "mjmcmc",
  N          = 5000,                         # number of iterations
  beta_prior = list(type = "g-prior", alpha = 100)
)

Plot results

plot(result.lin)

Summarize with posterior effects

# 'effects' specifies quantiles for posterior modes of effects across models
printn(summary(result.lin, effects = c(0.5, 0.025, 0.975)))
## 
## Best log marginal posterior:  56.2259 
## 
## $PIP
##    feats.strings marg.probs
## 1             X4 1.00000000
## 2             X3 1.00000000
## 3             X5 1.00000000
## 4             X2 0.95262397
## 5             X1 0.16880319
## 6            X18 0.10464730
## 7            X20 0.10105529
## 8            X15 0.08767336
## 9            X10 0.08257391
## 10           X16 0.08200567
## 11           X13 0.07741378
## 12           X12 0.07474969
## 13           X11 0.07090813
## 14           X19 0.06903813
## 15            X7 0.06899870
## 16           X14 0.06586870
## 17            X9 0.05690408
## 18            X8 0.05592665
## 19            X6 0.05154572
## 20           X17 0.04739590
## 
## $EFF
##    Covariate quant_0.5 quant_0.025 quant_0.975
## 1  intercept         0           0           0
## 2         X1         0           0      0.6606
## 3         X2    1.7024           0      1.7383
## 4         X3    4.8118      4.7487      4.8867
## 5         X4    4.9803      4.9086      5.0868
## 6         X5    4.7744      4.3659      4.8573
## 7         X6         0     -0.0694           0
## 8         X7         0     -0.1382           0
## 9         X8         0           0       0.072
## 10        X9         0     -0.1875           0
## 11       X10         0           0      0.5609
## 12       X11         0           0       0.444
## 13       X12         0     -0.2502           0
## 14       X13         0     -0.3655           0
## 15       X14         0           0      0.2716
## 16       X15         0           0      0.5654
## 17       X16         0           0      0.5074
## 18       X17         0           0      0.2266
## 19       X18         0           0      0.4858
## 20       X19         0     -0.2733           0
## 21       X20         0     -0.5012           0

Run parallel MJMCMC

# ---- fbms() call (parallel MJMCMC) ----
# Explicit prior choice:
#   beta_prior = list(type = "g-prior", alpha = 100)
# To switch to another prior, e.g. robust:
#   beta_prior = list(type = "robust")
#   method = mjmcmc.parallel
#   runs, cores  : parallelization controls
result.lin.par <- fbms(
  formula    = Y ~ 1 + .,
  data       = df,
  method     = "mjmcmc.parallel",
  N          = 5000,                         # number of iterations
  beta_prior = list(type = "g-prior", alpha = 100),
  runs = runs,
  cores = cores
)
printn(summary(result.lin.par, effects = c(0.5, 0.025, 0.975)))
## 
## Best log marginal posterior:  56.2259 
## 
## $PIP
##    feats.strings marg.probs
## 1             X4 1.00000000
## 2             X3 1.00000000
## 3             X5 1.00000000
## 4             X2 0.95518016
## 5             X1 0.17518018
## 6            X10 0.10751407
## 7            X20 0.10044924
## 8            X18 0.09363710
## 9            X16 0.09089584
## 10           X15 0.08729800
## 11           X13 0.08418363
## 12           X19 0.07302402
## 13           X14 0.07262783
## 14           X11 0.06881624
## 15           X17 0.06104115
## 16            X9 0.05570830
## 17           X12 0.05215211
## 18            X6 0.04938128
## 19            X7 0.04711947
## 20            X8 0.04447923
## 
## $EFF
##    Covariate quant_0.5 quant_0.025 quant_0.975
## 1  intercept         0           0           0
## 2         X1         0           0      0.6606
## 3         X2    1.7027           0      1.7394
## 4         X3    4.8118      4.7487      4.8885
## 5         X4    4.9803      4.9043      5.0888
## 6         X5    4.7744      4.3659      4.8576
## 7         X6         0     -0.0694           0
## 8         X7         0     -0.1346           0
## 9         X8         0           0       0.072
## 10        X9         0     -0.1875           0
## 11       X10         0           0      0.5625
## 12       X11         0           0      0.4382
## 13       X12         0     -0.2501           0
## 14       X13         0     -0.3742           0
## 15       X14         0           0      0.2739
## 16       X15         0           0      0.5567
## 17       X16         0           0      0.5074
## 18       X17         0           0      0.2266
## 19       X18         0           0      0.4718
## 20       X19         0     -0.2733           0
## 21       X20         0      -0.511           0

Example 3 — Bayesian Fractional Polynomials (FP)

Reference: [@hubin2023fractional]

We augment the linear example to follow an FP truth and fit with GMJMCMC.

FP model class

\[ \begin{aligned} Y_i \mid \mu_i, \phi &\sim \mathfrak{f}(y|\mu_i,\phi),\\ \mathsf{h}(\mu_i) &= \beta_0 + \sum_{j=1}^{p} \sum_{k \in K} \gamma_{jk}\, \beta_{jk}\, \rho_k(x_{ij}), \quad \text{with } K = \mathbf{F}_0 \cup \mathbf{F}_1 \cup \mathbf{F}_2, \end{aligned} \]

# Create FP-style response with known structure, covariates are from previous example
df$Y <- p05(df$X1) + df$X1 + pm05(df$X3) + p0pm05(df$X3) + df$X4 +
         pm1(df$X5) + p0(df$X6) + df$X8 + df$X10 + rnorm(nrow(df))

# Allow common FP transforms
transforms <- c(
  "p0", "p2", "p3", "p05", "pm05", "pm1", "pm2", "p0p0",
  "p0p05", "p0p1", "p0p2", "p0p3", "p0p05", "p0pm05", "p0pm1", "p0pm2"
)

# Generation probabilities — here only modifications and mutations
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(0, 1, 0, 1)

# Feature-generation parameters
params <- gen.params.gmjmcmc(ncol(df) - 1)
params$feat$D <- 1   # max depth 1 features

Run GMJMCMC (single-thread)

result <- fbms(
  formula    = Y ~ 1 + .,
  data       = df,
  method     = "gmjmcmc",
  transforms = transforms,
  beta_prior = list(type = "Jeffreys-BIC"),
  probs      = probs,
  params     = params,
  P          = 10
)

printn(summary(result))
## 
## Best   population: 10  log marginal posterior: -206.4266 
## 
##    feats.strings   marg.probs
## 1       pm05(X3) 1.0000000000
## 2       p0p0(X3) 1.0000000000
## 3        pm1(X5) 1.0000000000
## 4     p0pm05(X6) 0.9999959549
## 5      p0pm1(X3) 0.9359835862
## 6            X17 0.8981659238
## 7            X15 0.8795347468
## 8            X10 0.8584612830
## 9             X9 0.8507841966
## 10    p0pm2(X13) 0.7855348631
## 11           X20 0.1825368449
## 12            X4 0.0362927730
## 13            X7 0.0166240434
## 14           X13 0.0079441065
## 15            X2 0.0002276410
## 16     p0p3(X20) 0.0001784105

Parallel GMJMCMC

result_parallel <- fbms(
  formula    = Y ~ 1 + .,
  data       = df,
  method     = "gmjmcmc.parallel",
  transforms = transforms,
  beta_prior = list(type = "Jeffreys-BIC"),
  probs      = probs,
  params     = params,
  P          = 10,
  runs       = runs,
  cores      = cores
)

printn(summary(result_parallel))
## 
## Best   population: 10  thread: 1  log marginal posterior: -506.0695 
## 
##   feats.strings  marg.probs
## 1    p0pm05(X3) 1.000000000
## 2            X5 1.000000000
## 3       pm2(X5) 1.000000000
## 4            X3 0.398001125
## 5      p0p0(X3) 0.001258547

Example 4 — Mixed-effects FP with interactions

Dataset: \(n=659\) kids; response \(y\): standardized height; covariates: c.bf, c.age, m.ht, m.bmi, reg. Random intercept for dr.
We specify a custom estimator that uses a mixed model (via lme4), and plug it into fbms() with family = "custom". We pass extra parameters of the estimator through mlpost_params = list(dr = dr,r = r)

# Custom approximate log marginal likelihood for mixed model using Laplace approximation
mixed.model.loglik.lme4 <- function (y, x, model, complex, mlpost_params) {
  if (sum(model) > 1) {
    x.model <- x[, model]
    data <- data.frame(y, x = x.model[, -1], dr = mlpost_params$dr)
    mm <- lmer(as.formula(paste0("y ~ 1 +",
                          paste0(names(data)[2:(ncol(data)-1)], collapse = "+"),
                          " + (1 | dr)")), data = data, REML = FALSE)
  } else {
    data <- data.frame(y, dr = mlpost_params$dr)
    mm <- lmer(y ~ 1 + (1 | dr), data = data, REML = FALSE)
  }
  # log marginal likelihood (Laplace approx) + log model prior
  mloglik <- as.numeric(logLik(mm)) - 0.5 * log(length(y)) * (ncol(data) - 2)
  if (length(mlpost_params$r) == 0) mlpost_params$r <- 1 / nrow(x)
  lp <- log_prior(mlpost_params, complex)
  list(crit = mloglik + lp, coefs = fixef(mm))
}
library(lme4)
## Loading required package: Matrix
data(Zambia, package = "cAIC4")

df <- as.data.frame(sapply(Zambia[1:5], scale))

transforms <- c("p0","p2","p3","p05","pm05","pm1","pm2",
                "p0p0","p0p05","p0p1","p0p2","p0p3",
                "p0p05","p0pm05","p0pm1","p0pm2")

probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(1, 1, 0, 1) # include modifications and interactions

params <- gen.params.gmjmcmc(ncol(df) - 1)
params$feat$D <- 1
params$feat$pop.max <- 10

result2a <- fbms(
  formula      = z ~ 1 + .,
  data         = df,
  method       = "gmjmcmc.parallel",
  transforms   = transforms,
  probs        = probs,
  params       = params,
  P            = 10,
  N            = 100,
  runs         = runs,
  cores        = cores,
  family       = "custom",
  loglik.pi    = mixed.model.loglik.lme4,
  model_prior  = list(r = 1 / nrow(df)), # model_prior is passed to mlpost_params
  extra_params = list(dr = droplevels(Zambia$dr)) # extra_params are passed to mlpost_params
)

printn(summary(result2a, tol = 0.05, labels = names(df)[-1]))
## 
## Best   population: 6  thread: 1  log marginal posterior: -876.5476 
## 
##   feats.strings marg.probs
## 1          c.bf  1.0000000
## 2         c.age  1.0000000
## 3         m.bmi  1.0000000
## 4          m.ht  1.0000000
## 5 (c.age*c.age)  0.9999999

Example 5 — Bayesian Logic Regression

Reference: [@hubin2020logic]

Model

\[ \mathsf{h}(\mu_i) = \beta_0 + \sum_{j=1}^{q} \gamma_j \beta_j L_{ji}, \quad L_{ji} \text{ are logic trees (e.g., } (x_{i1}\wedge x_{i2}) \vee x_{i3}^c ). \]

We generate Boolean covariates and a known logic signal, define a custom estimator with a logic prior, and fit via GMJMCMC.

n <- 2000
p <- 50

set.seed(1)
X2 <- as.data.frame(matrix(rbinom(n * p, size = 1, prob = runif(n * p, 0, 1)), n, p))
y2.Mean <- 1 + 7*(X2$V4*X2$V17*X2$V30*X2$V10) + 9*(X2$V7*X2$V20*X2$V12) + 
           3.5*(X2$V9*X2$V2) + 1.5*(X2$V37)

Y2 <- rnorm(n, mean = y2.Mean, sd = 1)
df <- data.frame(Y2, X2)

# Train/test split
df.training <- df[1:(n/2), ]
df.test     <- df[(n/2 + 1):n, ]
df.test$Mean <- y2.Mean[(n/2 + 1):n]

Custom estimator with logic regression priors

estimate.logic.lm <- function(y, x, model, complex, mlpost_params) {
  suppressWarnings({
    mod <- fastglm(as.matrix(x[, model]), y, family = gaussian())
  })
  mloglik <- -(mod$aic + (log(length(y)) - 2) * (mod$rank)) / 2
  wj <- complex$width
  lp <- sum(log(factorial(wj))) - sum(wj * log(4 * mlpost_params$p) - log(4))
  logpost <- mloglik + lp
  if (logpost == -Inf) logpost <- -10000
  list(crit = logpost, coefs = mod$coefficients)
}

Run GMJMCMC

set.seed(5001)

# Only "not" operator; "or" is implied by De Morgan via "and" + "not"
transforms <- c("not")
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(1, 1, 0, 1) # no projections

params <- gen.params.gmjmcmc(p)
params$feat$pop.max <- 50
params$feat$L <- 15

result <- fbms(
  formula     = Y2 ~ 1 + .,
  data        = df.training,
  method      = "gmjmcmc",
  transforms  = transforms,
  N           = 500,
  P           = 10,
  family      = "custom",
  loglik.pi   = estimate.logic.lm,
  probs       = probs,
  params      = params,
  model_prior = list(p = p)
)

printn(summary(result))
## 
## Best   population: 9  log marginal posterior: -1912.786 
## 
##           feats.strings   marg.probs
## 1        ((V4*V10)*V17) 1.0000000000
## 2               (V9*V2) 1.0000000000
## 3                   V37 1.0000000000
## 4        ((V20*V7)*V12) 1.0000000000
## 5  (V33*(V30*(V4*V17))) 0.9999999179
## 6                   V30 0.9999969055
## 7                   V17 0.0137256351
## 8                   V32 0.0025183771
## 9                   V35 0.0020386609
## 10                  V12 0.0013621295
## 11                  V45 0.0012730261
## 12                  V18 0.0012697929
## 13                  V16 0.0010825818
## 14                  V41 0.0010734783
## 15                  V44 0.0009967761
## 16                  V31 0.0009461623
## 17                   V8 0.0008938150
## 18                  V13 0.0007713390
## 19                   V3 0.0006926010
## 20                  V24 0.0006547320
## 21                  V21 0.0006439233
## 22                  V49 0.0006393027
## 23                   V9 0.0006283463
## 24                  V36 0.0006209956
## 25                  V15 0.0006131099
# Extract models
mpm   <- get.mpm.model(result, y = df.training$Y2, x = df.training[,-1],
                       family = "custom", loglik.pi = estimate.logic.lm, params = list(p = 50))
printn(mpm$coefs)
##          (Intercept)       ((V4*V10)*V17)              (V9*V2) 
##            0.6578016            2.7049544            3.5767605 
##                  V30                  V37 (V33*(V30*(V4*V17))) 
##            0.6051133            1.3582049            2.1279042 
##       ((V20*V7)*V12) 
##            9.1213744
mpm2  <- get.mpm.model(result, y = df.training$Y2, x = df.training[,-1])
printn(mpm2$coefs)
##          (Intercept)       ((V4*V10)*V17)              (V9*V2) 
##            0.6578016            2.7049544            3.5767605 
##                  V30                  V37 (V33*(V30*(V4*V17))) 
##            0.6051133            1.3582049            2.1279042 
##       ((V20*V7)*V12) 
##            9.1213744
mbest <- get.best.model(result)
printn(mbest$coefs)
##            Intercept       ((V4*V10)*V17)              (V9*V2) 
##            0.6578016            2.7049544            3.5767605 
##                  V30                  V37 (V33*(V30*(V4*V17))) 
##            0.6051133            1.3582049            2.1279042 
##       ((V20*V7)*V12) 
##            9.1213744

Prediction

# Correct link is identity for Gaussian
pred       <- predict(result, x = df.test[,-1], link = function(x) x)
pred_mpm   <- predict(mpm,   x = df.test[,-1], link = function(x) x)
pred_best  <- predict(mbest, x = df.test[,-1], link = function(x) x)

# RMSEs
printn(sqrt(mean((pred$aggr$mean - df.test$Y2)^2)))
## [1] 1.528321
printn(sqrt(mean((pred_mpm     - df.test$Y2)^2)))
## [1] 1.528355
printn(sqrt(mean((pred_best    - df.test$Y2)^2)))
## [1] 1.528355
printn(sqrt(mean((df.test$Mean - df.test$Y2)^2)))
## [1] 1.028222
# Errors to the true mean (oracle)
printn(sqrt(mean((pred$aggr$mean - df.test$Mean)^2)))
## [1] 1.076062
printn(sqrt(mean((pred_best      - df.test$Mean)^2)))
## [1] 1.076113
printn(sqrt(mean((pred_mpm       - df.test$Mean)^2)))
## [1] 1.076113
# Quick diagnostic plot
plot(pred$aggr$mean, df.test$Y2,
     xlab = "Predicted (BMA)", ylab = "Observed")
points(pred$aggr$mean, df.test$Mean, col = 2)
points(pred_best,      df.test$Mean, col = 3)
points(pred_mpm,       df.test$Mean, col = 4)


Example 6 — Full BGNLM classification (Bernoulli): spam data

We fit a binomial BGNLM and compare BMA, best-model, and MPM predictions.
Important: specify the correct link in predict() (here logistic).

library(kernlab)
data("spam")

df <- spam[, c(58, 1:57)]
n  <- nrow(df)
p  <- ncol(df) - 1

colnames(df) <- c("y", paste0("x", 1:p))
df$y <- as.numeric(df$y == "spam")

to3 <- function(x) x^3
transforms <- c("sigmoid","sin_deg","exp_dbl","p0","troot","to3")
probs <- gen.probs.gmjmcmc(transforms)
probs$gen <- c(1, 1, 1, 1)

params <- gen.params.gmjmcmc(p)
params$feat$check.col <- FALSE

set.seed(6001)
result <- fbms(
  formula    = y ~ 1 + .,
  data       = df,
  method     = "gmjmcmc",
  family     = "binomial",
  beta_prior = list(type = "Jeffreys-BIC"),
  transforms = transforms,
  probs      = probs,
  params     = params
)

printn(summary(result))
## 
## Best   population: 2  log marginal posterior: -1048.305 
## 
##             feats.strings  marg.probs
## 1                      x5 1.000000000
## 2                      x8 1.000000000
## 3                     x55 1.000000000
## 4            sigmoid(x21) 1.000000000
## 5                     x16 1.000000000
## 6                     x20 1.000000000
## 7                     x21 1.000000000
## 8                     x23 1.000000000
## 9                     x25 1.000000000
## 10                    x27 1.000000000
## 11            sigmoid(x7) 1.000000000
## 12                p0(x17) 1.000000000
## 13               to3(x25) 1.000000000
## 14                    x41 1.000000000
## 15                    x42 1.000000000
## 16                    x44 1.000000000
## 17                    x12 1.000000000
## 18                     x6 1.000000000
## 19           sigmoid(x52) 1.000000000
## 20                    x53 1.000000000
## 21                    x52 1.000000000
## 22                    x45 1.000000000
## 23              (x50*x25) 1.000000000
## 24                p0(x19) 1.000000000
## 25                p0(x22) 0.999999632
## 26 sigmoid(1+1*x16+1*x55) 0.999999632
## 27               (x24*x3) 0.998106918
## 28                     x2 0.998091170
## 29                    x28 0.998090532
## 30             troot(x27) 0.998090188
## 31                    x39 0.998090175
## 32                    x32 0.998090164
## 33                    x46 0.994872577
## 34                    x19 0.994843934
## 35              troot(x8) 0.807086602
## 36                    x24 0.807017530
## 37                    x49 0.806845676
## 38                    x29 0.036497149
## 39                    x40 0.017892855
## 40                    x10 0.014132910
## 41           sigmoid(x46) 0.003217954

Prediction accuracy

# Logistic link
invlogit <- function(x) 1/(1 + exp(-x))

# Model averaging
pred <- predict(result, x = df[,-1], link = invlogit)
printn(mean(round(pred$aggr$mean) == df$y))
## [1] 0.9397957
# Best model
bm <- get.best.model(result)
preds <- predict(bm, df[,-1], link = invlogit)
printn(mean(round(preds) == df$y))
## [1] 0.9402304
# Median Probability Model
mpm <- get.mpm.model(result, family = "binomial", y = df$y, x = df[,-1])
preds_mpm <- predict(mpm, df[,-1], link = invlogit)
printn(mean(round(preds_mpm) == df$y))
## [1] 0.9402304

References