---
title: "Chapter 02-S03: Beta–Binomial Conjugacy for One Proportion"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
reference-section-title: References
vignette: >
  %\VignetteIndexEntry{Chapter 02-S03: Beta–Binomial Conjugacy for One Proportion}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(glmbayes)
if (requireNamespace("bayesrules", quietly = TRUE)) library(bayesrules)
```

---

## 1. The model

### 1.1 Likelihood

Suppose we observe \(n\) independent binary trials — each a success (1) or failure (0) — with a common success probability \(\theta \in (0,1)\).  The total number of successes \(y = \sum_i y_i\) follows:

\[
  y \mid \theta \;\sim\; \mathrm{Binomial}(n,\, \theta).
\]

The likelihood is

\[
  p(y \mid \theta) \;=\; \binom{n}{y}\,\theta^y\,(1-\theta)^{n-y},
\]

and everything relevant for updating our beliefs about \(\theta\) is captured by the two sufficient statistics \(y\) (successes) and \(n - y\) (failures).

### 1.2 Prior

The parameter \(\theta\) is a probability, so its support is \([0,1]\).  The **Beta distribution** is the natural conjugate prior:

\[
  p(\theta) \;=\; \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}, \qquad \theta \in [0,1],
\]

with shape parameters \(a > 0\) and \(b > 0\).  Its mean is \(a/(a+b)\) and its variance is \(ab/[(a+b)^2(a+b+1)]\).

**Interpreting the parameters.**  Think of \(a - 1\) as a "prior number of successes" and \(b - 1\) as a "prior number of failures" accumulated before the current study.  The total "prior sample size" is \(a + b - 2\) and the prior mean is \(a/(a+b)\).

| Prior belief | Parameter choice |
|---|---|
| No information — completely flat | \(a = b = 1\) (uniform on \([0,1]\)) |
| Mild belief near 0.5 | \(a = b = 3\) |
| Moderate belief near 0.4 | \(a = 4, b = 6\) (mean = 0.4) |
| Strong belief near 0.2 | \(a = 4, b = 16\) (mean = 0.2) |

### 1.3 Posterior derivation

Multiply the likelihood kernel \(\theta^y(1-\theta)^{n-y}\) by the prior kernel \(\theta^{a-1}(1-\theta)^{b-1}\):

\[
  p(\theta \mid y)
  \;\propto\; \theta^{(a+y)-1}(1-\theta)^{(b+n-y)-1}.
\]

This is the kernel of a Beta distribution, so the conjugate posterior is:

\[
  \boxed{\theta \mid y \;\sim\; \mathrm{Beta}(a + y,\; b + n - y).}
\]

The update rule is simple:

- Add the **number of successes** \(y\) to the first shape parameter.
- Add the **number of failures** \(n - y\) to the second shape parameter.

### 1.4 The posterior mean as a compromise

\[
  \mathbb{E}[\theta \mid y]
  \;=\; \frac{a + y}{a + b + n}
  \;=\; \frac{a+b}{a+b+n}\cdot\frac{a}{a+b}
  \;+\;
  \frac{n}{a+b+n}\cdot\frac{y}{n}.
\]

This is a weighted average of the prior mean \(a/(a+b)\) and the data frequency \(y/n\), with weights proportional to the "prior sample size" \(a+b\) and the actual sample size \(n\).  When \(n\) is much larger than \(a + b\) the data dominate; when \(n\) is small the prior pulls the estimate toward its own mean.

---

## 2. `bayesrules` illustration

The **`bayesrules`** package provides `plot_beta_binomial()` and `summarize_beta_binomial()` for quick visualization and tabular summaries of the Beta–Binomial update [@JohnsonLauEtAl2022].

**Scenario.**  A clinical researcher believes, based on earlier trials, that about 40% of patients respond to a new analgesic.  They encode this as a **Beta(4, 6)** prior (mean = 0.4, effective prior sample size = 10).  In a new pilot study they observe \(y = 14\) responders out of \(n = 30\) patients.

```{r bb-data}
a0 <- 4;  b0 <- 6          ## Beta(4, 6) prior: mean = 0.4
y  <- 14; n  <- 30          ## data: 14/30 successes

## Analytic posterior
post_a <- a0 + y            ## 4 + 14 = 18
post_b <- b0 + (n - y)      ## 6 + 16 = 22
```

```{r bb-bayesrules-plot, eval = requireNamespace("bayesrules", quietly = TRUE)}
library(bayesrules)
## Overlay prior, likelihood (scaled), and posterior densities
plot_beta_binomial(
  alpha      = a0,
  beta       = b0,
  y          = y,
  n          = n,
  prior      = TRUE,
  likelihood = TRUE,
  posterior  = TRUE
)
```

```{r bb-bayesrules-summary, eval = requireNamespace("bayesrules", quietly = TRUE)}
## Tabular summary of prior, likelihood, and posterior
summarize_beta_binomial(alpha = a0, beta = b0, y = y, n = n)
```

**Reading the output.**  The posterior parameters are Beta(18, 22), giving a posterior mean of \(18/40 = 0.45\) — pulled from the prior mean (0.40) toward the data frequency (\(14/30 \approx 0.47\)) in proportion to their relative weights.  The posterior is noticeably narrower than the prior, reflecting the information gained from 30 observations.

**Analytic posterior.**

```{r bb-analytic}
bb_analytic <- data.frame(
  Example = "Analgesic trial (BayesRules)",
  n = n,
  y = y,
  Posterior = sprintf("Beta(%d, %d)", post_a, post_b),
  Mean = post_a / (post_a + post_b),
  SD = sqrt(post_a * post_b / ((post_a + post_b)^2 * (post_a + post_b + 1))),
  check.names = FALSE
)
knitr::kable(bb_analytic, digits = 4,
  caption = "Conjugate Beta--Binomial posterior")
```

---

## 3. Real data: the Bechdel test

The **Bechdel test** asks whether a film features at least two women who talk to each other about something other than a man.  The `bayesrules::bechdel` dataset records the test result for `r if(requireNamespace("bayesrules", quietly=TRUE)) nrow(bayesrules::bechdel) else "1794"` films [@JohnsonLauEtAl2022].

**Research question.**  What fraction of films pass the Bechdel test?

**Prior.**  Based on earlier analyses suggesting roughly 45% of films pass, we specify a **Beta(9, 11)** prior (mean = 0.45, effective sample size = 20 — moderately informative).

```{r bechdel-data, eval = requireNamespace("bayesrules", quietly = TRUE)}
library(bayesrules)

## Binary outcome: 1 = PASS, 0 = FAIL
pass   <- as.integer(bechdel[["binary"]] == "PASS")
n_bech <- length(pass)
y_bech <- sum(pass)

cat(sprintf("n = %d,  y (PASS) = %d,  proportion = %.3f\n",
            n_bech, y_bech, y_bech / n_bech))
```

```{r bechdel-posterior, eval = requireNamespace("bayesrules", quietly = TRUE)}
a0_b <- 9;  b0_b <- 11    ## Beta(9, 11) prior: mean = 0.45

post_a_b <- a0_b + y_bech
post_b_b <- b0_b + (n_bech - y_bech)

c(prior_mean  = a0_b / (a0_b + b0_b),
  data_freq   = round(y_bech / n_bech, 4),
  post_mean   = round(post_a_b / (post_a_b + post_b_b), 4))

## 95% credible interval
round(qbeta(c(0.025, 0.975), post_a_b, post_b_b), 4)
```

With over 1700 films the data dominate the prior; the posterior mean is nearly identical to the observed proportion.

```{r bechdel-analytic, eval = requireNamespace("bayesrules", quietly = TRUE)}
analytic_mean_b <- post_a_b / (post_a_b + post_b_b)
analytic_sd_b   <- sqrt(post_a_b * post_b_b /
                        ((post_a_b + post_b_b)^2 * (post_a_b + post_b_b + 1)))

bech_analytic <- data.frame(
  Dataset = "Bechdel test",
  n = n_bech,
  y = y_bech,
  Posterior = sprintf("Beta(%d, %d)", post_a_b, post_b_b),
  Mean = analytic_mean_b,
  SD = analytic_sd_b,
  check.names = FALSE
)
knitr::kable(bech_analytic, digits = 4,
  caption = "Conjugate Beta--Binomial posterior")
```

### 3.1 Fitting with **`glmb()`** using **`dBeta()`**

The scalar Beta–Binomial model generalizes to **`glmb()`** with `family = binomial(link = "identity")` and the `dBeta()` prior.  In this intercept-only setting the single regression coefficient **is** \(\theta\) directly on the probability scale.

```{r bechdel-glmb, eval = requireNamespace("bayesrules", quietly = TRUE)}
df_bech <- data.frame(y = pass)

bech_beta <- matrix(a0_b / (a0_b + b0_b), nrow = 1, ncol = 1,
                    dimnames = list(NULL, "(Intercept)"))

bech_pf <- dBeta(shape1 = a0_b, shape2 = b0_b, beta = bech_beta)

set.seed(2026)
fit_bech <- glmb(
  n       = 20000,
  y ~ 1,
  data    = df_bech,
  weights = rep(1L, n_bech),
  family  = binomial(link = "identity"),
  pfamily = bech_pf
)
print(fit_bech)
```

```{r bechdel-compare, eval = requireNamespace("bayesrules", quietly = TRUE)}
bech_compare <- data.frame(
  Dataset = "Bechdel test",
  Posterior = bech_analytic$Posterior,
  `Analytic Mean` = bech_analytic$Mean,
  `Analytic SD`   = bech_analytic$SD,
  `glmb Post.Mean` = fit_bech$coef.means["(Intercept)"],
  `glmb Post.Sd`   = sd(fit_bech$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(bech_compare, digits = 4,
  caption = "Analytic vs. glmb() posterior mean and SD")
```

The **`glmb Post.Mean`** and **`glmb Post.Sd`** columns should match the analytic **Mean** and **SD** to Monte Carlo error because `dBeta()` implements the exact conjugate update.

---

## 4. Connection to glmbayes

This scalar prototype is the intuition behind **Binomial `glmb()`** (Chapters 7–9), where instead of a single \(\theta\) you model \(\mathrm{logit}(\theta_i) = x_i^\top \beta\) and place a multivariate Normal prior on the coefficient vector \(\beta\).  The conjugate `dBeta()` path is specific to intercept-only models with the identity link; once covariates enter, the conjugacy is lost and the general envelope sampler takes over.

---

## See also

- **Chapter-02-S01** — introduction to Bayesian inference and conjugacy.
- **Chapter-02-S02** — Normal–Normal conjugacy for one mean.
- **Chapter-02-S04** — Gamma–Poisson conjugacy for one count rate.
- **Chapter-02-S05** — Gamma–Gamma conjugacy for a Gamma response rate.
- **Chapters 07–09** — Binomial regression with **`glmb()`**.
