Type: Package
Title: Tidy Differential Privacy
Version: 0.1.0
Description: A tidy-style interface for applying differential privacy to data frames. Provides pipe-friendly functions to add calibrated noise, compute private statistics, and track privacy budgets using the epsilon-delta differential privacy framework. Implements the Laplace mechanism (Dwork et al. 2006 <doi:10.1007/11681878_14>) and the Gaussian mechanism for achieving differential privacy as described in Dwork and Roth (2014) <doi:10.1561/0400000042>.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.3
Imports: magrittr, stats
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown
VignetteBuilder: knitr
URL: https://github.com/ttarler/tidydp
BugReports: https://github.com/ttarler/tidydp/issues
NeedsCompilation: no
Packaged: 2025-11-23 17:20:57 UTC; ttarler
Author: Thomas Tarler [aut, cre]
Maintainer: Thomas Tarler <ttarler@gmail.com>
Repository: CRAN
Date/Publication: 2025-11-27 19:00:02 UTC

tidydp: Tidy Differential Privacy

Description

A tidy-style interface for applying differential privacy to data frames. Provides pipe-friendly functions to add calibrated noise, compute private statistics, and track privacy budgets using the epsilon-delta differential privacy framework.

Author(s)

Maintainer: Thomas Tarler ttarler@gmail.com

See Also

Useful links:


Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Arguments

lhs

A value or the magrittr placeholder.

rhs

A function call using the magrittr semantics.

Value

The result of calling 'rhs(lhs)'.


Add Gaussian Noise

Description

Adds Gaussian (normal) noise to a numeric value or vector for (epsilon, delta)-differential privacy. The Gaussian mechanism provides (epsilon, delta)-DP and is often used when delta > 0 is acceptable.

Usage

add_gaussian_noise(x, sensitivity, epsilon, delta = 1e-05)

Arguments

x

Numeric value or vector to add noise to

sensitivity

The L2 sensitivity of the query

epsilon

Privacy parameter (smaller = more privacy)

delta

Privacy parameter (probability of privacy breach), typically very small

Value

Numeric value or vector with Gaussian noise added


Add Laplace Noise

Description

Adds Laplace-distributed noise to a numeric value or vector for differential privacy. The Laplace mechanism is typically used for queries with sensitivity based on the maximum absolute difference a single record can make.

Usage

add_laplace_noise(x, sensitivity, epsilon)

Arguments

x

Numeric value or vector to add noise to

sensitivity

The sensitivity of the query (maximum change from one record)

epsilon

Privacy parameter (smaller = more privacy, more noise)

Value

Numeric value or vector with Laplace noise added


Check Privacy Budget

Description

Checks if a proposed operation would exceed the privacy budget

Usage

check_privacy_budget(budget, epsilon_required, delta_required = 0)

Arguments

budget

A privacy budget object

epsilon_required

Epsilon required for the operation

delta_required

Delta required for the operation (default: 0)

Value

Logical indicating if budget is sufficient

Examples

budget <- new_privacy_budget(epsilon_total = 1.0)
check_privacy_budget(budget, epsilon_required = 0.5)

Add Differentially Private Noise to Data Frame Columns

Description

Adds calibrated Laplace or Gaussian noise to specified numeric columns in a data frame to achieve differential privacy. This is the primary function for column-level privacy.

Usage

dp_add_noise(
  data,
  columns,
  epsilon,
  delta = NULL,
  lower = NULL,
  upper = NULL,
  mechanism = NULL,
  .budget = NULL
)

Arguments

data

A data frame

columns

Character vector of column names to add noise to

epsilon

Privacy parameter (smaller = more privacy, more noise)

delta

Privacy parameter for Gaussian mechanism (default: NULL, uses Laplace)

lower

Named numeric vector of lower bounds for each column

upper

Named numeric vector of upper bounds for each column

mechanism

Either "laplace" or "gaussian" (auto-selected based on delta if NULL)

.budget

Optional privacy budget object to track expenditure

Value

Data frame with noise added to specified columns

Examples

data <- data.frame(age = c(25, 30, 35, 40), income = c(50000, 60000, 70000, 80000))
private_data <- data %>%
  dp_add_noise(
    columns = c("age", "income"),
    epsilon = 0.1,
    lower = c(age = 0, income = 0),
    upper = c(age = 100, income = 200000)
  )

Differentially Private Count

Description

Computes a differentially private count of rows, optionally grouped by specified columns.

Usage

dp_count(data, epsilon, delta = NULL, group_by = NULL, .budget = NULL)

Arguments

data

A data frame

epsilon

Privacy parameter

delta

Privacy parameter (default: NULL, uses Laplace mechanism)

group_by

Character vector of column names to group by (optional)

.budget

Optional privacy budget object to track expenditure

Value

Data frame with (possibly grouped) counts

Examples

data <- data.frame(city = c("NYC", "LA", "NYC", "LA", "NYC"),
                   age = c(25, 30, 35, 40, 45))
# Overall count
dp_count(data, epsilon = 0.1)

# Grouped count
data %>% dp_count(epsilon = 0.1, group_by = "city")

Differentially Private Mean

Description

Computes a differentially private mean of a numeric column.

Usage

dp_mean(
  data,
  column,
  epsilon,
  delta = NULL,
  lower = NULL,
  upper = NULL,
  group_by = NULL,
  .budget = NULL
)

Arguments

data

A data frame

column

Column name to compute mean of

epsilon

Privacy parameter

delta

Privacy parameter (default: NULL, uses Laplace mechanism)

lower

Lower bound of the data range

upper

Upper bound of the data range

group_by

Character vector of column names to group by (optional)

.budget

Optional privacy budget object to track expenditure

Value

Data frame with (possibly grouped) private means

Examples

data <- data.frame(city = c("NYC", "LA", "NYC", "LA"),
                   income = c(50000, 60000, 70000, 80000))
data %>% dp_mean("income", epsilon = 0.1, lower = 0, upper = 200000, group_by = "city")

Differentially Private Sum

Description

Computes a differentially private sum of a numeric column.

Usage

dp_sum(
  data,
  column,
  epsilon,
  delta = NULL,
  lower = NULL,
  upper = NULL,
  group_by = NULL,
  .budget = NULL
)

Arguments

data

A data frame

column

Column name to compute sum of

epsilon

Privacy parameter

delta

Privacy parameter (default: NULL, uses Laplace mechanism)

lower

Lower bound of the data range

upper

Upper bound of the data range

group_by

Character vector of column names to group by (optional)

.budget

Optional privacy budget object to track expenditure

Value

Data frame with (possibly grouped) private sums

Examples

data <- data.frame(city = c("NYC", "LA", "NYC", "LA"),
                   sales = c(100, 200, 150, 250))
data %>% dp_sum("sales", epsilon = 0.1, lower = 0, upper = 1000, group_by = "city")

Create a New Privacy Budget

Description

Initializes a privacy budget tracker for managing epsilon and delta across multiple differentially private operations. The budget uses composition theorems to track cumulative privacy loss.

Usage

new_privacy_budget(epsilon_total, delta_total = 1e-05, composition = "basic")

Arguments

epsilon_total

Total epsilon budget available

delta_total

Total delta budget available (default: 1e-5)

composition

Method for budget composition: "basic" or "advanced" (default: "basic")

Value

A privacy budget object (list with class "privacy_budget")

Examples

budget <- new_privacy_budget(epsilon_total = 1.0, delta_total = 1e-5)

Print Privacy Budget

Description

Print Privacy Budget

Usage

## S3 method for class 'privacy_budget'
print(x, ...)

Arguments

x

A privacy budget object

...

Additional arguments (unused)

Value

Returns the privacy budget object invisibly. Called primarily for the side effect of printing budget information to the console, including total epsilon and delta budgets, amounts spent, remaining budget, composition method, and number of operations executed.


Calculate L1 Sensitivity for Count Queries

Description

For count queries, the sensitivity is 1 (adding/removing one record changes count by 1)

Usage

sensitivity_count()

Value

Numeric sensitivity value


Calculate L2 Sensitivity for Mean Queries

Description

For mean queries with bounded data

Usage

sensitivity_mean(lower, upper, n)

Arguments

lower

Lower bound of the data range

upper

Upper bound of the data range

n

Sample size

Value

Numeric sensitivity value


Calculate L1 Sensitivity for Sum Queries

Description

For sum queries with bounded data, the sensitivity is the maximum change in the sum when one record is substituted (changed from any value to any other value in the range). This uses the standard substitution model for differential privacy.

Usage

sensitivity_sum(lower, upper)

Arguments

lower

Lower bound of the data range

upper

Upper bound of the data range

Value

Numeric sensitivity value


Spend Privacy Budget

Description

Records a privacy expenditure and updates the budget

Usage

spend_privacy_budget(
  budget,
  epsilon_spent,
  delta_spent = 0,
  operation_name = NULL
)

Arguments

budget

A privacy budget object

epsilon_spent

Epsilon spent on the operation

delta_spent

Delta spent on the operation (default: 0)

operation_name

Name/description of the operation (optional)

Value

Updated privacy budget object