---
title: "The hydrocan adapter system"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The hydrocan adapter system}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

## Overview

hydrocan normalises data from multiple Canadian hydrometric networks into one
consistent output schema. The mechanism that makes this possible is the
*adapter*: a small object that binds a data source name to a description and a
set of fetch functions.

This vignette explains:

1. What an adapter is and what it must provide.
2. How the router uses adapters to dispatch calls.
3. How the built-in adapters are implemented.
4. How to write and register your own adapter.

## The adapter contract

An adapter is created with `new_hydrocan_adapter()`:

```{r}
#| eval: false
new_hydrocan_adapter(
  name,
  description,
  list_stations_fn,
  fetch_flows_fn = NULL,
  fetch_daily_flows_fn = NULL,
  fetch_levels_fn = NULL,
  fetch_daily_levels_fn = NULL,
  list_stations_meta_fn = NULL,
  license = NULL,
  license_url = NULL,
  terms_url = NULL
)
```

| Argument | Type | Contract |
|---|---|---|
| `name` | single character | Unique identifier; becomes the `provider_name` column in all output and the registry key |
| `description` | single character | Human-readable description of the source and its limitations; shown by `hc_list_sources()` |
| `list_stations_fn` | `function()` | No arguments; returns a character vector of station IDs this adapter can serve |
| `fetch_flows_fn` | `function(station_id, start_date, end_date)` or `NULL` | Returns a tibble matching the realtime schema; `NULL` if sub-daily flow data is not available |
| `fetch_daily_flows_fn` | `function(station_id, start_date, end_date)` or `NULL` | Returns a tibble matching the daily schema; `NULL` if daily flow data is not available |
| `fetch_levels_fn` | `function(station_id, start_date, end_date)` or `NULL` | Returns a tibble matching the realtime schema with `parameter = "water_level"`; `NULL` if sub-daily level data is not available |
| `fetch_daily_levels_fn` | `function(station_id, start_date, end_date)` or `NULL` | Returns a tibble matching the daily schema with `parameter = "water_level"`; `NULL` if daily level data is not available |
| `list_stations_meta_fn` | `function()` or `NULL` | No arguments; returns a tibble matching the stations schema; `NULL` if station metadata is not available |
| `license` | single character or `NULL` | Optional license name (e.g. `"CC-BY 4.0"`); exposed by `hc_list_sources()` |
| `license_url` | single character or `NULL` | Optional URL to the license text |
| `terms_url` | single character or `NULL` | Optional URL to the data provider's terms of use |

At least one fetch function must be non-`NULL`.

### Output schemas

#### Realtime (sub-daily) - `fetch_flows_fn` / `fetch_levels_fn`

| Column | Type | Notes |
|---|---|---|
| `station_id` | chr | As provided by the caller |
| `timestamp` | POSIXct UTC | Sub-daily observations |
| `value` | dbl | |
| `parameter` | chr | `"water_discharge"` or `"water_level"` |
| `unit` | chr | Canonical form after normalization (e.g. `"m3/s"`, `"m"`) |
| `provider_name` | chr | Must equal the adapter name |
| `quality_code` | chr | Raw provider quality code; `NA` if unavailable |
| `qf_desc` | chr | Provider description of the quality code; `NA` if unavailable |

#### Daily - `fetch_daily_flows_fn` / `fetch_daily_levels_fn`

Same as the realtime schema above, but with `date` (Date) in place of
`timestamp` (POSIXct).

#### Stations - `list_stations_meta_fn`

| Column | Type | Notes |
|---|---|---|
| `station_id` | chr | |
| `station_name` | chr | |
| `provider_name` | chr | Must equal the adapter name |
| `longitude` | dbl | |
| `latitude` | dbl | |
| `elevation_m` | dbl | `NA` if unavailable |
| `period_start` | Date | `NA` if unavailable |
| `period_end` | Date | `NA` if station is still active |
| `notes` | list | Adapter-specific metadata; `NULL` per row if unused |

## How the router works

When you call `hc_read_flows()`, the router:

1. Calls `list_stations_fn()` on every registered adapter.
2. Finds which adapter(s) claim the requested station.
3. If more than one adapter matches, stops with an error asking you to supply
   `source =` explicitly. Station IDs must be unambiguous across the registry.
4. Calls the appropriate fetch function on the matched adapter, wrapped in
   `tryCatch` so a failure for one station does not abort the whole request.
5. Binds all results with `dplyr::bind_rows()`.

Passing `source = "adaptername"` restricts the router to that adapter, but it
still calls `list_stations_fn()` for that adapter and checks that the
requested station is present before fetching data.

`hc_list_sources()` returns a tibble of all registered adapters with their
descriptions and a logical column per data type indicating what each adapter
supports. `hc_read_stations()` queries all adapters for station metadata,
skipping those that do not implement `list_stations_meta_fn`.

## Built-in adapters

### Hydro-Quebec (`hydroquebec`)

The `hydroquebec` adapter wraps the
[Hydro-Quebec open data portal](https://donnees.hydroquebec.com/explore/dataset/donnees-hydrometriques/),
which provides flow measurements at Hydro-Quebec reservoir facilities via an
Opendatasoft REST API. No authentication is required.

**Key characteristics:**

- Station IDs use Hydro-Quebec's internal format, e.g. `"3-230"`.
- The dataset covers a rolling window of approximately 10 days; historical data
  is not available.
- Only flow data is available (`parameter = "water_discharge"`); no water level.
- The `approval` column is `NA` for all records (the source does not publish
  approval status); `quality_flag` carries the source's point type field.

**Station listing and data access:**

```{r hq-flows, eval = FALSE}
library(hydrocan)

# Sub-daily (hourly) flows
flows <- hc_read_flows(
  station_id = "3-230",
  start_date = Sys.Date() - 5,
  end_date = Sys.Date(),
  source = "hydroquebec"
)

# Source-native daily flows
daily <- hc_read_daily_flows(
  station_id = "3-230",
  start_date = Sys.Date() - 5,
  end_date = Sys.Date(),
  source = "hydroquebec"
)
```

The adapter pages through the API (100 records per request) and filters the
returned records to the requested date range in R, because the API stores
`split_date` as a text field rather than a datetime field.

**Source code:** `R/hydroquebec.R`. Registered via:

```r
hydrocan_adapter_hydroquebec <- function() {
  new_hydrocan_adapter(
    "hydroquebec",
    paste(
      "Hydro-Quebec open data (Opendatasoft platform).",
      "Flow data only; no water level.",
      "Rolling window of approximately 10 days - historical data is not available."
    ),
    .hq_list_stations,
    fetch_flows_fn       = .hq_fetch_flows,
    fetch_daily_flows_fn = .hq_fetch_daily_flows,
    list_stations_meta_fn = .hq_list_stations_meta
  )
}
```

### Registration

Adapters are registered at load time in `R/hydrocan-package.R`. Use
`hc_list_sources()` to see all currently registered sources and which data
types each supports.

## Writing a new adapter

Suppose you want to add a hypothetical provincial network called "MyProv" that
exposes a JSON API. The steps are:

### Step 1 - Implement the internal functions

Create `R/myprov.R`:

```r
.MYPROV_URL <- "https://data.myprov.ca/api/hydro"

.myprov_list_stations <- function() {
  resp <- httr2::request(.MYPROV_URL) |>
    httr2::req_url_query(endpoint = "stations", format = "json") |>
    httr2::req_perform() |>
    httr2::resp_body_json(simplifyVector = TRUE)

  resp$station_id  # character vector
}

.myprov_fetch_flows <- function(station_id, start_date, end_date) {
  resp <- httr2::request(.MYPROV_URL) |>
    httr2::req_url_query(
      endpoint = "timeseries",
      station  = station_id,
      from     = format(start_date),
      to       = format(end_date),
      format   = "json"
    ) |>
    httr2::req_perform() |>
    httr2::resp_body_json(simplifyVector = TRUE)

  tibble::tibble(
    station_id    = station_id,
    timestamp     = as.POSIXct(resp$timestamp, tz = "UTC"),
    value         = as.numeric(resp$discharge_cms),
    parameter     = "water_discharge",
    unit          = "m3/s",
    provider_name = "myprov",
    quality_code  = resp$quality_code,
    qf_desc       = NA_character_
  )
}

hydrocan_adapter_myprov <- function() {
  new_hydrocan_adapter(
    "myprov",
    "MyProv provincial hydrometric network. Sub-daily flows only.",
    .myprov_list_stations,
    fetch_flows_fn = .myprov_fetch_flows
  )
}
```

If your source also provides daily data, levels, or station metadata, supply
the corresponding optional function arguments. Only the capabilities you
implement will be advertised by `hc_list_sources()`.

#### Using a stored station list when no endpoint exists

Some sources do not expose a station-listing endpoint. In those cases, bundle a
character vector of known station IDs directly in the package and return it from
`list_stations_fn`:

```r
.MYPROV_STATIONS <- c("MP001", "MP002", "MP003")

.myprov_list_stations <- function() .MYPROV_STATIONS
```

The tradeoff is that the list must be maintained manually as the network
changes. The router only requires that `list_stations_fn()` return a character
vector; how that vector is produced is left entirely to the adapter.

### Step 2 - Register the adapter

Add one line to the `.onLoad` block in `R/hydrocan-package.R`:

```r
.onLoad <- function(libname, pkgname) {
  register_hydrocan_adapter(hydrocan_adapter_hydroquebec())
  register_hydrocan_adapter(hydrocan_adapter_cehq())
  register_hydrocan_adapter(hydrocan_adapter_myprov())   # add this
}
```

### Step 3 - Add tests

Tests for adapters are written against a mock adapter rather than hitting the
live network. This keeps the test suite fast and fully offline. The pattern,
established in `tests/testthat/helper-mocks.R`, is:

1. Write a `list_stations_fn` that returns a hardcoded character vector.
2. Write fetch functions that generate deterministic tibbles from their date
   arguments without making any HTTP requests.
3. Assemble these into an adapter with `new_hydrocan_adapter()`.
4. Register it for the duration of a single test with `local_register_adapter()`,
   which restores the prior registry state on exit.

```r
.myprov_stations <- c("MP001", "MP002")

.myprov_mock_fetch_flows <- function(station_id, start_date, end_date) {
  dates <- seq(as.Date(start_date), as.Date(end_date), by = "day")
  tibble::tibble(
    station_id    = station_id,
    timestamp     = as.POSIXct(dates, tz = "UTC"),
    value         = seq_along(dates) * 1.0,
    parameter     = "water_discharge",
    unit          = "m3/s",
    provider_name = "myprov",
    quality_code  = NA_character_,
    qf_desc       = NA_character_
  )
}

mock_myprov_adapter <- new_hydrocan_adapter(
  "myprov",
  "Mock MyProv adapter for offline testing.",
  function() .myprov_stations,
  fetch_flows_fn = .myprov_mock_fetch_flows
)

test_that("myprov adapter returns correct schema", {
  local_register_adapter(mock_myprov_adapter)
  result <- hc_read_flows(
    station_id = "MP001",
    start_date = "2024-01-01",
    end_date   = "2024-01-03",
    source     = "myprov"
  )
  expect_s3_class(result, "hydrocan_realtime")
  expect_equal(nrow(result), 3L)
})
```

`local_register_adapter()` and `local_clear_registry()` are defined in
`tests/testthat/helper-mocks.R` and are available to all test files automatically.

### What the schema validator will catch

`validate_hydrocan_schema()` is called automatically after every data-fetching
API call (`hc_read_flows()`, `hc_read_daily_flows()`, `hc_read_levels()`,
`hc_read_daily_levels()`). It will stop with a clear message if:

- Any required column is missing from the returned tibble.

It also normalises the `unit` column: common variants such as `"m³/s"`,
`"cms"`, or `"m^3/s"` are all mapped to the canonical `"m3/s"`. Unrecognised
unit strings pass through unchanged with a warning, identifying the raw string
so it can be added to the mapping table in `R/schema.R`.
