---
title: "Introduction to Rvoterdistance"
author: "Loren Collingwood"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to Rvoterdistance}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Overview

**Rvoterdistance** calculates the geographic distance between voters and
polling locations (or vote-by-mail drop boxes) using the Haversine
great-circle formula, implemented in C++ for speed. The package supports:

- **Nearest location**: find the single closest polling place for each voter
- **k-nearest locations**: find the k closest locations per voter
- **Distance threshold**: find all locations within a specified radius
- **sf integration**: pass `sf` POINT geometries directly

## Installation

```{r install, eval = FALSE}
# From GitHub:
remotes::install_github("lorenc5/Rvoterdistance")
```

## Included Data

The package ships with two example datasets:

- `king_dbox`: King County, WA ballot drop box locations and a sample of
  voters
- `meck_ev`: Mecklenburg County, NC early voting locations and a sample of
  voters

```{r data}
library(Rvoterdistance)
data(meck_ev)

str(voter_meck)
str(early_meck)
```

## Basic Usage: Nearest Location

The main function is `nearest_location()`. With the default `k = 1`, it
returns one row per voter with the distance to the nearest polling
location:

```{r nearest}
result <- nearest_location(
  voters    = voter_meck,
  locations = early_meck,
  voter_coords    = c("lat", "long"),
  location_coords = c("lat", "long")
)

head(result)
```

The output includes the voter data, the matched location data, and three
distance columns: `distance_m` (meters), `distance_km`, and
`distance_miles`.

## k-Nearest Locations

To find the 3 closest early voting sites for each voter:

```{r knearest}
result_k3 <- nearest_location(
  voter_meck, early_meck,
  voter_coords    = c("lat", "long"),
  location_coords = c("lat", "long"),
  k = 3,
  append_data = FALSE
)

head(result_k3, 9)
```

The output is in long format with a `rank` column (1 = nearest).

## Distance Threshold

Find all early voting locations within 5 miles of each voter:

```{r threshold}
result_5mi <- nearest_location(
  voter_meck[1:20, ], early_meck,
  voter_coords    = c("lat", "long"),
  location_coords = c("lat", "long"),
  max_dist = 5,
  units = "miles",
  append_data = FALSE
)

head(result_5mi, 10)

# How many locations within 5 miles per voter?
table(result_5mi$voter_id)
```

## Using sf Objects

If your data are already `sf` POINT objects, pass them directly ---
no need to specify coordinate column names:

```{r sf, eval = requireNamespace("sf", quietly = TRUE)}
library(sf)

voters_sf <- st_as_sf(voter_meck, coords = c("long", "lat"), crs = 4326)
locs_sf   <- st_as_sf(early_meck, coords = c("long", "lat"), crs = 4326)

result_sf <- nearest_location(voters_sf, locs_sf, append_data = FALSE)
head(result_sf)
```

If the CRS is not WGS-84 (EPSG:4326), the package automatically
transforms to WGS-84 and prints a message.

## Convenience Functions

For quick calculations without the full `nearest_location()` interface:

```{r convenience}
# Minimum distance in km for each voter
km <- dist_km(voter_meck$lat, voter_meck$long,
              early_meck$lat, early_meck$long)
summary(km)

# Minimum distance in miles
mi <- dist_mile(voter_meck$lat, voter_meck$long,
                early_meck$lat, early_meck$long)
summary(mi)

# Single-pair distance (e.g., Charlotte to Raleigh)
haversine(35.2271, -80.8431, 35.7796, -78.6382, units = "miles")
```

## Performance

The Haversine computation runs in C++ and uses partial sorting
(`std::nth_element`) for k-nearest queries, giving O(n) per voter
instead of O(n log n). For large voter files, enable progress reporting:

```{r progress, eval = FALSE}
result <- nearest_location(
  big_voter_file, locations,
  voter_coords = c("lat", "lon"),
  location_coords = c("lat", "lon"),
  k = 3,
  progress = TRUE
)
```
