---
title: "Reproducing the paper examples"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Reproducing the paper examples}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
editor_options:
  markdown:
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

This vignette reproduces the key examples from:

> Arthur, G. "A Hierarchical Vector-Based Framework for Multi-Scale
> Exploded-View Cartography."

Each section corresponds to a result in the paper and uses the package
API directly, including `explode_state()`, `explode_sf()`,
`explode_grouped()`, `layout_regions()`, and `calibration_row()`.

This vignette is designed for reproduction rather than routine package
checks. Several sections download external boundary files and may take
substantial time and disk space. Heavy chunks are therefore marked
`eval = FALSE`.

Reported values should match the paper within rounding tolerance.
Externally downloaded datasets may introduce small differences if source
files change over time.

```{r setup}
library(explodemap)
library(sf)
library(dplyr)
```

------------------------------------------------------------------------

## 1. New Jersey — Ground-truth calibration (Section 5)

New Jersey is the calibration dataset. The known-good parameters
($\alpha_r = 6{,}000$ m, $\alpha_l = 10{,}000$ m) were established by
visual validation and are used to derive the legibility coefficients
$\gamma_r$ and $\gamma_l$.

```{r nj-calibration, eval = FALSE}
nj <- explode_state(
  state_fips = "34", crs = 32118,
  region_map = list(
    North   = c("Bergen", "Essex", "Hudson", "Morris",
                "Passaic", "Sussex", "Union", "Warren"),
    Central = c("Hunterdon", "Mercer", "Middlesex",
                "Monmouth", "Somerset"),
    South   = c("Atlantic", "Burlington", "Camden", "Cape May",
                "Cumberland", "Gloucester", "Ocean", "Salem")
  ),
  label = "New Jersey"
)

summary(nj)
```

**Expected output (key values):**

| Quantity                                      | Value    |
|-----------------------------------------------|----------|
| n units                                       | 564      |
| n regions                                     | 3        |
| w_bar                                         | 3.94 km  |
| R_local                                       | 62.4 km  |
| n_bar                                         | 177      |
| R_local/w_bar                                 | 15.83    |
| alpha_r (derived)                             | 6,844 m  |
| alpha_l (derived)                             | 10,641 m |
| gamma_r implied (from known alpha_r = 6,000)  | 2.64     |
| gamma_l implied (from known alpha_l = 10,000) | 1.136    |

The implied $\gamma_l = 1.136$ from the New Jersey ground truth becomes
the recommended default for transfer to other datasets.

------------------------------------------------------------------------

## 2. Pennsylvania — Transfer test (Section 6)

Pennsylvania tests whether New Jersey-calibrated coefficients transfer
to a larger, denser dataset without retuning. The paper reports
formula-derived parameters $\alpha_r = 20{,}174$ m and
$\alpha_l = 12{,}447$ m.

The region map is defined as a reusable object so that both the transfer
run and the sensitivity analysis reference the same grouping:

```{r pa-region-map}
pa_region_map <- list(
  Southeast    = c("Philadelphia", "Delaware", "Chester",
                   "Montgomery", "Bucks"),
  Northeast    = c("Pike", "Monroe", "Carbon", "Northampton", "Lehigh",
                   "Luzerne", "Lackawanna", "Wayne", "Susquehanna",
                   "Wyoming", "Sullivan", "Columbia", "Montour",
                   "Schuylkill", "Berks", "Bradford"),
  Central      = c("Centre", "Clinton", "Lycoming", "Tioga", "Potter",
                   "Cameron", "Elk", "Clearfield", "Jefferson", "Indiana",
                   "Blair", "Huntingdon", "Mifflin", "Snyder", "Union",
                   "Northumberland", "Juniata", "Perry", "Dauphin",
                   "Lebanon"),
  SouthCentral = c("York", "Adams", "Lancaster", "Cumberland", "Franklin",
                   "Fulton", "Bedford", "Somerset", "Cambria"),
  Southwest    = c("Allegheny", "Westmoreland", "Fayette", "Greene",
                   "Washington", "Beaver", "Butler", "Armstrong",
                   "Lawrence"),
  Northwest    = c("Erie", "Crawford", "Mercer", "Venango", "Clarion",
                   "Forest", "Warren", "McKean")
)
```

```{r pa-transfer, eval = FALSE}
pa <- explode_state(
  state_fips = "42", crs = 26918,
  region_map = pa_region_map,
  label = "Pennsylvania"
)

summary(pa)
```

**Expected output (key values):**

| Quantity          | Value     |
|-------------------|-----------|
| n units           | 2,572     |
| n regions         | 6         |
| w_bar             | 6,725 m   |
| R_local           | 116,086 m |
| n_bar             | 449       |
| R_local/w_bar     | 17.26     |
| alpha_r (derived) | 20,174 m  |
| alpha_l (derived) | 12,447 m  |

### Sensitivity analysis

The paper reports that $\alpha_l$ is stable under $\pm 15\%$
perturbation:

```{r pa-sensitivity, eval = FALSE}
alpha_l_canonical <- 12447
alpha_r_canonical <- 20174

factors <- c(0.85, 0.90, 0.95, 1.00, 1.05, 1.10, 1.15)
labels  <- c("-15%", "-10%", "-5%", "canonical", "+5%", "+10%", "+15%")

rows <- list()
for (i in seq_along(factors)) {
  run <- explode_state(
    state_fips = "42", crs = 26918,
    region_map = pa_region_map,
    alpha_r = alpha_r_canonical,
    alpha_l = round(alpha_l_canonical * factors[i]),
    plot = FALSE, export = FALSE,
    label = paste0("PA ", labels[i])
  )
  rows[[i]] <- calibration_row(run)
  rows[[i]]$label <- labels[i]
  rows[[i]]$factor <- factors[i]
}

sensitivity_df <- bind_rows(rows)
print(sensitivity_df)
```

**Expected output:**

| Label     | alpha_l | Mean displacement |
|-----------|---------|-------------------|
| -15%      | 10,580  | \~20,476 m        |
| -10%      | 11,202  | \~20,524 m        |
| -5%       | 11,825  | \~20,575 m        |
| canonical | 12,447  | \~20,630 m        |
| +5%       | 13,069  | \~20,688 m        |
| +10%      | 13,692  | \~20,749 m        |
| +15%      | 14,314  | \~20,813 m        |

Mean displacement CV across the $\pm 15\%$ range is $< 0.02$, confirming
stability.

------------------------------------------------------------------------

## 3. Cross-state calibration (Section 7)

The state registry in `inst/registries/state_registry.R` contains New
Jersey, Pennsylvania, Ohio, and New York. The calibration runner
processes all registered states and reports gamma stability.

```{r calibration, eval = FALSE}
source(system.file("registries/state_registry.R", package = "explodemap"))

calib_rows <- list()
for (key in names(state_registry)) {
  reg <- state_registry[[key]]

  result <- tryCatch(
    explode_state(
      state_fips = reg$fips, crs = reg$crs,
      region_map = reg$region_map,
      allow_other = TRUE, plot = FALSE,
      label = reg$name
    ),
    error = function(e) {
      message("ERROR: ", e$message)
      NULL
    }
  )

  if (is.null(result)) next
  calib_rows[[key]] <- calibration_row(result)
}

calib_df <- bind_rows(calib_rows)
print(calib_df)
```

**Expected output (approximate):**

| State        | n     | Regions | w_bar (km) | R_local (km) | Ratio | gamma_r | gamma_l |
|--------------|-------|---------|------------|--------------|-------|---------|---------|
| New Jersey   | 564   | 3       | 3.94       | 62.4         | 15.83 | 2.64\*  | 1.136\* |
| Pennsylvania | 2,572 | 6       | 6.73       | 116.1        | 17.26 | 3.72\*  | 1.136   |
| Ohio         | 1,602 | 5       | 7.31       | 93.1         | 12.75 | 3.00    | 1.136   |
| New York     | 1,794 | 5       | 10.04      | 97.2         | 9.68  | 3.00    | 1.136   |

\* Implied from known ground-truth parameters.

The key finding is that $\gamma_l = 1.136$ is stable across states,
while $\gamma_r$ varies more, indicating that regional clearance still
benefits from dataset-specific visual validation.

------------------------------------------------------------------------

## 4. Ohio — Extended validation (Section 7)

Ohio provides a five-region test with three competing urban cores
(Cleveland, Columbus, Cincinnati):

```{r ohio, eval = FALSE}
oh <- explode_state(
  state_fips = "39", crs = 32617,
  region_map = list(
    Northeast = c("Cuyahoga", "Summit", "Lorain", "Lake", "Medina",
                  "Portage", "Geauga", "Ashtabula", "Trumbull", "Mahoning",
                  "Columbiana", "Carroll", "Stark", "Wayne", "Holmes",
                  "Harrison", "Jefferson"),
    Northwest = c("Lucas", "Wood", "Fulton", "Williams", "Defiance",
                  "Paulding", "Henry", "Putnam", "Hancock", "Sandusky",
                  "Erie", "Ottawa", "Seneca", "Wyandot", "Crawford",
                  "Huron", "Ashland", "Richland", "Morrow", "Knox",
                  "Marion", "Hardin", "Logan", "Union", "Delaware",
                  "Allen", "Van Wert", "Auglaize", "Shelby", "Mercer"),
    Central   = c("Franklin", "Licking", "Fairfield", "Pickaway",
                  "Madison", "Fayette", "Ross", "Clark", "Greene",
                  "Montgomery", "Preble", "Darke", "Miami", "Champaign"),
    Southwest = c("Hamilton", "Butler", "Warren", "Clermont", "Clinton",
                  "Highland", "Brown", "Adams", "Scioto", "Lawrence",
                  "Gallia", "Jackson", "Pike"),
    Southeast = c("Belmont", "Monroe", "Washington", "Meigs", "Morgan",
                  "Noble", "Guernsey", "Muskingum", "Perry", "Hocking",
                  "Athens", "Tuscarawas", "Coshocton", "Vinton")
  ),
  label = "Ohio"
)

summary(oh)
plot(oh, "both")
```

**Expected output:** $R_{\text{local}}/\bar{w} = 12.75$, placing Ohio in
the dense-municipal cluster. All three urban cores are correctly
suppressed by the $s_i$ term.

------------------------------------------------------------------------

## 5. Canada — Non-US validation (Section 7)

The Canada validation tests whether the framework transfers outside the
US administrative system entirely. Data comes from Statistics Canada
2021 Census Subdivisions.

```{r canada-regions}
province_regions <- data.frame(
  PRUID  = c("10", "11", "12", "13", "24", "35",
             "46", "47", "48", "59", "60", "61", "62"),
  region = c(rep("Atlantic", 4), "Quebec", "Ontario",
             rep("Prairies", 3), "Pacific",
             rep("Territories", 3)),
  stringsAsFactors = FALSE
)
```

```{r canada-download, eval = FALSE}
cache_file <- file.path(path.expand("~"), "explode_map_cache",
                        "canada_csds_2021.rds")

if (file.exists(cache_file)) {
  sf_raw <- readRDS(cache_file)
} else {
  url <- paste0(
    "https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/",
    "boundary-limites/files-fichiers/lcsd000b21a_e.zip"
  )
  tmp <- tempfile(fileext = ".zip")
  download.file(url, tmp, mode = "wb")
  dir <- file.path(tempdir(), "canada_csds")
  dir.create(dir, showWarnings = FALSE)
  unzip(tmp, exdir = dir)
  shp <- list.files(dir, "\\.shp$", recursive = TRUE, full.names = TRUE)
  sf_raw <- st_read(shp[1], quiet = TRUE)
  dir.create(dirname(cache_file), showWarnings = FALSE, recursive = TRUE)
  saveRDS(sf_raw, cache_file)
}
```

```{r canada-explode, eval = FALSE}
sf_proj <- sf_raw |>
  st_transform(3347) |>
  left_join(province_regions, by = "PRUID")

sf_proj$region[is.na(sf_proj$region)] <- "Other"

sf_prov <- sf_proj |>
  filter(region != "Territories")

canada <- explode_sf(
  sf_prov,
  region_col = "region",
  allow_other = TRUE,
  label = "Canada (provinces)"
)

summary(canada)
plot(canada, "both")
```

**Expected output:**

| Quantity      | Value                           |
|---------------|---------------------------------|
| n units       | \~4,800 (excluding territories) |
| n regions     | 5                               |
| R_local/w_bar | \~113                           |

The tightness ratio is an order of magnitude larger than in the US state
examples because Canadian CSDs include vast northern municipalities. The
formula-derived parameters still produce a coherent layout, illustrating
that the method can remain usable even in extreme tightness-ratio
regimes.

------------------------------------------------------------------------

## 6. HHS national grouped layout (Section 12)

The three-level extension places US states into 10 HHS region blocks
using anchor-based placement with collision refinement.

```{r hhs-lookup}
hhs_lookup <- data.frame(
  STUSPS = c(
    "CT", "ME", "MA", "NH", "RI", "VT",
    "NJ", "NY", "PR", "VI",
    "DE", "DC", "MD", "PA", "VA", "WV",
    "AL", "FL", "GA", "KY", "MS", "NC", "SC", "TN",
    "IL", "IN", "MI", "MN", "OH", "WI",
    "AR", "LA", "NM", "OK", "TX",
    "IA", "KS", "MO", "NE",
    "CO", "MT", "ND", "SD", "UT", "WY",
    "AZ", "CA", "HI", "NV", "GU", "AS", "MP",
    "AK", "ID", "OR", "WA"
  ),
  hhs_region = c(
    rep("1", 6), rep("2", 4), rep("3", 6), rep("4", 8),
    rep("5", 6), rep("6", 5), rep("7", 4), rep("8", 6),
    rep("9", 7), rep("10", 4)
  ),
  stringsAsFactors = FALSE
)
```

```{r hhs-download, eval = FALSE}
cache_file <- file.path(path.expand("~"), "explode_map_cache",
                        "us_states.rds")

if (file.exists(cache_file)) {
  states_sf <- readRDS(cache_file)
} else {
  url <- "https://www2.census.gov/geo/tiger/TIGER2024/STATE/tl_2024_us_state.zip"
  tmp <- tempfile(fileext = ".zip")
  download.file(url, tmp, mode = "wb", quiet = TRUE)
  dir <- file.path(tempdir(), "us_states")
  dir.create(dir, showWarnings = FALSE)
  unzip(tmp, exdir = dir)
  shp <- list.files(dir, "\\.shp$", recursive = TRUE, full.names = TRUE)
  states_sf <- st_read(shp[1], quiet = TRUE)
  dir.create(dirname(cache_file), showWarnings = FALSE, recursive = TRUE)
  saveRDS(states_sf, cache_file)
}
```

```{r hhs-explode, eval = FALSE}
states_proj <- states_sf |>
  st_transform(5070) |>
  left_join(hhs_lookup, by = "STUSPS")

states_proj$hhs_region[is.na(states_proj$hhs_region)] <- "Other"

hhs <- explode_grouped(
  states_proj,
  region_col   = "hhs_region",
  mode         = "auto_collision",
  alpha_l      = 120000,
  p            = 1.25,
  kappa        = 1.8,
  padding      = 80000,
  delta        = 20000,
  lambda       = 0.18,
  eta          = 0.18,
  padding_sep  = 30000,
  max_iter     = 60,
  label        = "US by HHS Region"
)

print(hhs)
plot(hhs, "all")
```

**Expected output:** The anchor solver converges within 60 iterations.
All 10 HHS regions are separated with a recognisable continental
arrangement. The `auto_collision` mode produces substantially more
legible output than `auto` alone because the spring-repulsion solver
reduces block overlaps in the densely packed Northeast corridor.

------------------------------------------------------------------------

## Replication checklist

After running all sections above, verify:

-   [ ] New Jersey implied $\gamma_l \approx 1.136$ from known
    $\alpha_l = 10{,}000$ m
-   [ ] Pennsylvania formula-derived $\alpha_l = 12{,}447$ m from
    $\gamma_l = 1.136$
-   [ ] Pennsylvania sensitivity CV $< 0.02$ for mean displacement
    across $\pm 15\%$
-   [ ] Cross-state $R_{\text{local}}/\bar{w}$ clusters around
    dense-unit and large-unit regimes
-   [ ] Canada layout remains coherent at
    $R_{\text{local}}/\bar{w} \approx 113$
-   [ ] HHS anchor solver converges with all 10 regions separated

All values should match those reported in the paper within rounding
tolerance.

Together, these examples cover the paper's calibration, transfer,
cross-state, international, and grouped-layout results using the package
API.

------------------------------------------------------------------------

## Session info

```{r session-info}
sessionInfo()
```