---
title: "Chapter 02-S05: Gamma–Gamma Conjugacy for One Response Rate"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
reference-section-title: References
vignette: >
  %\VignetteIndexEntry{Chapter 02-S05: Gamma–Gamma Conjugacy for One Response Rate}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(glmbayes)
```

---

## 1. The model

### 1.1 Likelihood

Suppose we observe \(n\) positive continuous responses \(y_1, \dots, y_n\) — waiting times, survival times, monetary amounts — drawn from a Gamma distribution with **known shape** \(k > 0\) and **unknown rate** \(\beta > 0\):

\[
  Y_i \mid \beta \;\sim\; \mathrm{Gamma}(k,\, \beta), \qquad i = 1,\dots,n.
\]

The mean response is \(\mu = k/\beta\) and the variance is \(k/\beta^2\).  The log-likelihood (up to constants) is:

\[
  \ell(\beta) \;\propto\; n k \log\beta - \beta\,\textstyle\sum_i y_i.
\]

**Why fix the shape?**  Conjugacy requires \(k\) to be treated as known.  In practice you either know it from domain knowledge (e.g., exponential responses have \(k = 1\)) or estimate it by method-of-moments: \(\hat k = \bar y^2 / s^2\).  When both \(k\) and \(\beta\) are unknown, conjugacy breaks and a rejection sampler is needed — that case is handled by `Gamma(log)` with `glmbdisp()`.

**The special case \(k = 1\)** gives the **Exponential distribution**, the natural model for inter-event waiting times under a constant-hazard assumption.

### 1.2 Prior

Because \(\beta > 0\), the Gamma distribution is again the natural conjugate prior:

\[
  \beta \;\sim\; \mathrm{Gamma}(\alpha_0,\, b_0),
\]

with prior mean \(\alpha_0 / b_0\) and variance \(\alpha_0 / b_0^2\).

**Relationship to the dispersion.**  In GLM terms the Gamma dispersion is \(\phi = 1/k\), so the shape \(k\) is the precision \(1/\phi\).  Specifying `lik_shape = k` in `dGamma(Inv_Dispersion = FALSE)` is equivalent to specifying a known dispersion.

**The "identity" link.**  When you pass `family = Gamma(link = "identity")` to **`glmb()`**, the single regression coefficient is the rate \(\beta\) directly.  This matches the conjugate parameterization.

### 1.3 Posterior derivation

Multiply the likelihood kernel \(\beta^{n k}\,e^{-\beta \sum y_i}\) by the prior kernel \(\beta^{\alpha_0 - 1}\,e^{-b_0 \beta}\):

\[
  p(\beta \mid y)
  \;\propto\;
  \beta^{(\alpha_0 + n k) - 1}\,e^{-(b_0 + \sum_i y_i)\,\beta}.
\]

This is a Gamma kernel, giving the conjugate posterior:

\[
  \boxed{\beta \mid y \;\sim\; \mathrm{Gamma}\!\left(\alpha_0 + n k,\;\; b_0 + \textstyle\sum_i y_i\right).}
\]

**Update rule:**

- Add \(n \times k\) to the shape (\(n\) itself when \(k = 1\)).
- Add the sum of observations \(\sum_i y_i\) to the rate.

The posterior mean **rate** is \((\alpha_0 + n k)/(b_0 + \sum y_i)\), and the posterior mean **response** is:

\[
  \mathbb{E}[\mu \mid y]
  \;\approx\; \frac{k}{\mathbb{E}[\beta \mid y]}
  \;=\; \frac{k(b_0 + \sum_i y_i)}{\alpha_0 + n k}.
\]

---

## 2. Illustration with base-R plots

The **`bayesrules`** package does not currently include a `plot_gamma_gamma()` helper, so we plot the prior, the (normalized) likelihood, and the posterior directly with `curve()`.

**Scenario.**  You record the waiting time between calls to a customer service hotline.  Exponential waiting times (\(k = 1\)) are a standard model.  You place a **\(\Gamma(2, 4)\)** prior on the call rate \(\beta\) (prior mean = 0.5 calls/minute).  You then observe \(n = 20\) inter-arrival times with sum \(\sum y = 35\) minutes.

```{r ggamma-illustration-setup}
alpha0 <- 2;  b0 <- 4         ## Gamma(2, 4) prior: mean = 0.5
k      <- 1                   ## known shape (exponential)
n_obs  <- 20L;  sum_y <- 35   ## data: 20 observations, sum = 35

post_shape <- alpha0 + n_obs * k
post_rate  <- b0    + sum_y

ggamma_analytic <- data.frame(
  Example = "Call inter-arrivals (illustration)",
  n = n_obs,
  `sum(y)` = sum_y,
  Posterior = sprintf("Gamma(%d, %d)", post_shape, post_rate),
  `Mean (rate)` = post_shape / post_rate,
  `SD (rate)`   = sqrt(post_shape) / post_rate,
  check.names = FALSE
)
knitr::kable(ggamma_analytic, digits = 4,
  caption = "Conjugate Gamma--Gamma posterior for rate beta")
```

```{r ggamma-illustration-plot, fig.width = 6, fig.height = 4}
beta_grid <- seq(0.01, 1.5, length.out = 500)

## Normalized likelihood: Gamma(n*k + 1, sum_y) shape, treated as density
lik_unnorm <- dgamma(beta_grid, shape = n_obs * k + 1, rate = sum_y)

## Scale likelihood to have same peak height as the prior (visual only)
prior_vals   <- dgamma(beta_grid, alpha0, b0)
post_vals    <- dgamma(beta_grid, post_shape, post_rate)
lik_scaled   <- lik_unnorm * (max(prior_vals) / max(lik_unnorm))

plot(beta_grid, prior_vals, type = "l", lwd = 2, col = "steelblue",
     xlab = expression(beta), ylab = "Density",
     main = "Gamma–Gamma update: prior, likelihood, posterior",
     ylim = c(0, max(post_vals) * 1.1))
lines(beta_grid, lik_scaled, lwd = 2, col = "darkgreen", lty = 2)
lines(beta_grid, post_vals,  lwd = 2, col = "tomato")
legend("topright",
       legend = c("Prior  Gamma(2, 4)",
                  "Likelihood (scaled)",
                  sprintf("Posterior  Gamma(%d, %d)", post_shape, post_rate)),
       col  = c("steelblue", "darkgreen", "tomato"),
       lty  = c(1, 2, 1), lwd = 2, bty = "n")
abline(v = alpha0 / b0,          lty = 3, col = "steelblue")
abline(v = post_shape / post_rate, lty = 3, col = "tomato")
```

The posterior is pulled from the prior mean (0.5) toward the data-based MLE (\(n/\sum y = 20/35 \approx 0.57\)) and is considerably tighter than either.

---

## 3. Real data: Boston Marathon finishing times

The `bayesrules::cherry_blossom_sample` dataset records net finishing times (minutes) for runners in a 10-mile cherry blossom race [@JohnsonLauEtAl2022].  Finishing times are continuous and positive; we model them as Gamma-distributed with a method-of-moments estimate of the shape \(k\).

```{r cb-data, eval = requireNamespace("bayesrules", quietly = TRUE)}
library(bayesrules)

## Use net finishing times (minutes); drop missing values
y_cb <- cherry_blossom_sample$net
y_cb <- y_cb[is.finite(y_cb)]
n_cb <- length(y_cb)

ybar_cb <- mean(y_cb)
s2_cb   <- var(y_cb)

## Method-of-moments estimate of shape k: k_hat = ybar^2 / s^2
k_hat_cb <- ybar_cb^2 / s2_cb

cat(sprintf("n = %d,  mean = %.2f min,  var = %.2f,  k_hat = %.3f\n",
            n_cb, ybar_cb, s2_cb, k_hat_cb))
```

The estimated shape \(\hat k \approx\) `r if(requireNamespace("bayesrules",quietly=TRUE)) round(mean(bayesrules::cherry_blossom_sample$net[is.finite(bayesrules::cherry_blossom_sample$net)])^2 / var(bayesrules::cherry_blossom_sample$net[is.finite(bayesrules::cherry_blossom_sample$net)]),2) else "4"` is treated as fixed for the conjugate analysis.

**Prior.**  Based on general knowledge that competitive 10-mile race times cluster around 85–100 minutes, we set a **\(\Gamma(5, 0.06)\)** prior on the rate \(\beta\) (prior mean rate \(\approx 0.083\), corresponding to a prior mean time \(k/\beta \approx 85\) minutes for the estimated \(k\)).

```{r cb-prior, eval = requireNamespace("bayesrules", quietly = TRUE)}
## Set prior centered at rate corresponding to ~85 min mean finish time
## E[mu] = k / beta_prior  =>  beta_prior = k_hat / 85
alpha0_cb <- 5
b0_cb     <- alpha0_cb / (k_hat_cb / ybar_cb)  ## prior mean rate = k_hat / ybar

cat(sprintf("Prior: Gamma(%.2f, %.4f),  prior mean rate = %.4f,  implied mean time = %.1f min\n",
            alpha0_cb, b0_cb,
            alpha0_cb / b0_cb,
            k_hat_cb / (alpha0_cb / b0_cb)))
```

```{r cb-posterior, eval = requireNamespace("bayesrules", quietly = TRUE)}
post_shape_cb <- alpha0_cb + n_cb * k_hat_cb
post_rate_cb  <- b0_cb     + sum(y_cb)

post_mean_rate_cb <- post_shape_cb / post_rate_cb
post_sd_rate_cb   <- sqrt(post_shape_cb) / post_rate_cb
post_mean_time_cb <- k_hat_cb / post_mean_rate_cb

cb_analytic <- data.frame(
  Dataset = "Cherry blossom finish times",
  n = n_cb,
  `k (fixed)` = k_hat_cb,
  Posterior = sprintf("Gamma(%.2f, %.4f)", post_shape_cb, post_rate_cb),
  `Mean (rate beta)` = post_mean_rate_cb,
  `SD (rate beta)`   = post_sd_rate_cb,
  `Mean time (min)`  = post_mean_time_cb,
  check.names = FALSE
)
knitr::kable(cb_analytic, digits = 4,
  caption = "Conjugate Gamma--Gamma posterior for rate beta (shape--rate)")
```

### 3.1 Fitting with **`glmb()`**

```{r cb-glmb, eval = requireNamespace("bayesrules", quietly = TRUE)}
df_cb <- data.frame(y = y_cb)

cb_beta <- matrix(alpha0_cb / b0_cb, nrow = 1, ncol = 1,
                  dimnames = list(NULL, "(Intercept)"))

cb_pf <- dGamma(
  shape          = alpha0_cb,
  rate           = b0_cb,
  beta           = cb_beta,
  Inv_Dispersion = FALSE,
  lik_shape      = k_hat_cb
)

set.seed(2026)
fit_cb <- glmb(
  n       = 20000,
  y ~ 1,
  data    = df_cb,
  family  = Gamma(link = "identity"),
  pfamily = cb_pf
)
print(fit_cb)
```

```{r cb-compare, eval = requireNamespace("bayesrules", quietly = TRUE)}
cb_compare <- data.frame(
  Dataset = "Cherry blossom finish times",
  Posterior = cb_analytic$Posterior,
  `Analytic Mean (rate)` = cb_analytic$`Mean (rate beta)`,
  `Analytic SD (rate)`   = cb_analytic$`SD (rate beta)`,
  `glmb Post.Mean` = fit_cb$coef.means["(Intercept)"],
  `glmb Post.Sd`   = sd(fit_cb$coefficients[, "(Intercept)", drop = TRUE]),
  check.names = FALSE
)
knitr::kable(cb_compare, digits = 4,
  caption = "Analytic vs. glmb() posterior mean and SD for rate beta")
```

The **`glmb Post.Mean`** and **`glmb Post.Sd`** columns refer to the **rate** \(\beta\).  Mean finish time is \(\hat k\) divided by posterior mean rate (see **Mean time** in the analytic table).

---

## 4. Connection to glmbayes

The scalar Gamma–Gamma model is the conjugate prototype for **Gamma `glmb()` with `family = Gamma(link = "identity")`**.  In that setting the single coefficient is the rate \(\beta\), the prior is `dGamma(shape, rate, beta, Inv_Dispersion = FALSE, lik_shape = k)`, and the posterior update is the closed form above.

When the shape \(k\) is **unknown**, the Gamma prior on \(k\) is no longer conjugate.  That case is handled by **`Gamma(link = "log")`** with `dNormal()` priors on the regression coefficients plus **`glmbdisp()`** for the dispersion \(\phi = 1/k\) (Chapter 10).

---

## See also

- **Chapter-02-S01** — introduction to Bayesian inference and conjugacy.
- **Chapter-02-S02** — Normal–Normal conjugacy for one mean.
- **Chapter-02-S03** — Beta–Binomial conjugacy for one proportion.
- **Chapter-02-S04** — Gamma–Poisson conjugacy for one count rate.
- **Chapter 10** — non-conjugate Gamma regression with `Gamma(log)`.