predictset

predictset is an R package for model-agnostic conformal prediction and distribution-free uncertainty quantification. It constructs prediction intervals (regression) and prediction sets (classification) with finite-sample coverage guarantees - no distributional assumptions required. Works with any model: lm, glm, ranger, xgboost, or custom user-defined models via make_model().

Installation

# Install from CRAN
install.packages("predictset")

# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("charlescoverdale/predictset")

library(predictset)

# Get 90% prediction intervals around any model - 3 lines of code
result <- conformal_split(x, y, model = y ~ ., x_new = x_new, alpha = 0.10)
result$lower  # lower bounds
result$upper  # upper bounds

What is conformal prediction?

Standard machine learning models produce point predictions: a single number for regression, a single class for classification. But in practice, you almost always need to know how uncertain that prediction is. Conformal prediction is a framework for wrapping any model in a layer of calibrated uncertainty quantification. Given a target coverage level (say 90%), it produces prediction intervals or prediction sets that are guaranteed to contain the true value at least 90% of the time, regardless of the underlying model or data distribution.

The key property is that this guarantee holds in finite samples. It’s not asymptotic, and it doesn’t require distributional assumptions. The only requirement is that the calibration data and test data are exchangeable (roughly: drawn from the same distribution). This makes conformal prediction fundamentally different from parametric confidence intervals, bootstrap intervals, or Bayesian credible intervals, all of which depend on modelling assumptions that may not hold.

predictset implements the main conformal methods from the recent literature (split conformal, Jackknife+, CV+, conformalized quantile regression for regression, and APS, RAPS, and LAC for classification) in a lightweight package with only two dependencies (cli and stats).

How does predictset compare to other packages?

Feature	predictset	probably	conformalInference	MAPIE
Language	R	R	R	Python
Regression	Yes	Yes	Yes	Yes
Classification	Yes	No	No	Yes
Model-agnostic	Yes	tidymodels only	Yes	scikit-learn only
On CRAN	Pending	Yes	No (GitHub only)	N/A
Jackknife+ / CV+	Yes	No	Yes	Yes
CQR	Yes	Yes	Yes	Yes
APS / RAPS	Yes	No	No	Yes
Mondrian CP	Yes	No	No	Yes
Weighted CP	Yes	No	No	Yes
Adaptive CI	Yes	No	No	No
Conditional diagnostics	Yes	No	No	Partial
Dependencies	2	14+	5	N/A
Last updated	2026	2024	2019	2024

predictset is designed to complement rather than compete with probably. If you’re working in the tidymodels ecosystem and only need regression intervals, probably integrates neatly with your workflow. predictset fills the gaps: classification methods (APS, RAPS, LAC), Jackknife+/CV+ for regression, and a model-agnostic interface that works with any model, not just tidymodels workflows.

conformalInference by Ryan Tibshirani was foundational research code, but it hasn’t been updated since 2019, isn’t on CRAN, and doesn’t cover classification.

Quick start

Regression: prediction intervals with coverage verification

library(predictset)

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 5), ncol = 5)
y <- x[, 1] * 2 + x[, 2] + rnorm(n)
x_new <- matrix(rnorm(100 * 5), ncol = 5)
y_new <- x_new[, 1] * 2 + x_new[, 2] + rnorm(100)  # true values (for evaluation)

# Fit conformal intervals
result <- conformal_split(x, y, model = y ~ ., x_new = x_new, alpha = 0.10)
print(result)
#> ── Conformal Prediction Intervals (Split Conformal) ──
#> • Coverage target: "90%"
#> • Training: 250 | Calibration: 250 | Predictions: 100
#> • Conformal quantile: 1.876
#> • Median interval width: 3.7519

# Extract intervals as a data frame
head(data.frame(pred = result$pred, lower = result$lower, upper = result$upper))

# Verify coverage: should be ≥ 90%
coverage(result, y_new)

# Predict on new data later (fit once, predict many times)
future_data <- matrix(rnorm(50 * 5), ncol = 5)
predict(result, newdata = future_data)

Classification: prediction sets

set.seed(42)
n <- 400
x <- matrix(rnorm(n * 4), ncol = 4)
y <- factor(ifelse(x[, 1] + x[, 2] > 0, "A", "B"))
x_new <- matrix(rnorm(50 * 4), ncol = 4)

clf <- make_model(
  train_fun = function(x, y) glm(y ~ ., data = data.frame(y = y, x),
                                  family = "binomial"),
  predict_fun = function(object, x_new) {
    df <- as.data.frame(x_new)
    names(df) <- paste0("X", seq_len(ncol(x_new)))
    p <- predict(object, newdata = df, type = "response")
    cbind(A = 1 - p, B = p)
  },
  type = "classification"
)

result <- conformal_aps(x, y, model = clf, x_new = x_new, alpha = 0.10)
result$sets[[1]]   # prediction set for first observation: e.g. c("A")
result$sets[[10]]  # ambiguous case might include: c("A", "B")
table(set_size(result))  # distribution of set sizes

Choosing a method

Scenario	Recommended method	Why
Default for regression	`conformal_split()`	Fast, single model fit
Small dataset, need tight intervals	`conformal_cv()` or `conformal_jackknife()`*	Uses all data for both training and calibration
Heteroscedastic data	`conformal_split(..., score_type = "normalized")` or `conformal_cqr()`	Adaptive interval widths
Default for classification	`conformal_aps()`	Adaptive set sizes, well-calibrated
Many classes, want small sets	`conformal_raps()`	Regularized APS, penalises large sets
Coverage must hold per subgroup	`conformal_mondrian()` / `conformal_mondrian_class()`	Group-conditional guarantees
Covariate shift between train/test	`conformal_weighted()`	Importance-weighted calibration
Sequential/online prediction	`conformal_aci()`**	Adapts to distribution drift over time

*Jackknife+ and CV+ have a theoretical coverage guarantee of 1-2α (Barber et al. 2021), weaker than split conformal’s 1-α. In practice, coverage is typically near 1-α. **ACI provides asymptotic (not finite-sample) coverage guarantees.

Model interface

There are three ways to specify a model. This flexibility means predictset works with anything from a simple linear model to a custom deep learning wrapper.

1. Formula shorthand (fits lm internally):

result <- conformal_split(x, y, model = y ~ ., x_new = x_new)

This is the quickest way to get started. Pass a formula and predictset handles the fitting.

2. Fitted model (auto-detected for lm, glm, and ranger):

fit <- lm(y ~ ., data = data.frame(y = y, x))
result <- conformal_split(x, y, model = fit, x_new = x_new)

If you’ve already fitted a model and want conformal intervals around its predictions, pass it directly. predictset recognises standard R model objects and extracts the training and prediction functions automatically.

3. Custom model via make_model() (works with anything):

xgb_model <- make_model(
  train_fun = function(x, y) {
    dtrain <- xgboost::xgb.DMatrix(x, label = y)
    xgboost::xgb.train(list(objective = "reg:squarederror"), dtrain, nrounds = 100)
  },
  predict_fun = function(object, x_new) {
    predict(object, xgboost::xgb.DMatrix(x_new))
  },
  type = "regression"
)

result <- conformal_split(x, y, model = xgb_model, x_new = x_new)

make_model() takes a training function, a prediction function, and a type ("regression" or "classification"). This is how you use conformal prediction with xgboost, keras, lightgbm, or any other model.

Methods

Function	Type	Method	Reference
`conformal_split()`	Regression	Split conformal	Vovk et al. (2005)
`conformal_cv()`	Regression	CV+	Barber et al. (2021)
`conformal_jackknife()`	Regression	Jackknife+	Barber et al. (2021)
`conformal_cqr()`	Regression	Conformalized Quantile Regression	Romano et al. (2019)
`conformal_mondrian()`	Regression	Mondrian (group-conditional)	Vovk et al. (2005)
`conformal_weighted()`	Regression	Weighted conformal (covariate shift)	Tibshirani et al. (2019)
`conformal_aps()`	Classification	Adaptive Prediction Sets	Romano, Sesia & Candes (2020)
`conformal_raps()`	Classification	Regularized APS	Angelopoulos et al. (2021)
`conformal_lac()`	Classification	Least Ambiguous Classifier	Sadinle, Lei & Wasserman (2019)
`conformal_mondrian_class()`	Classification	Mondrian (group-conditional)	Vovk et al. (2005)
`conformal_aci()`	Sequential	Adaptive Conformal Inference	Gibbs & Candes (2021)
`coverage()`	Diagnostic	Empirical coverage rate
`coverage_by_group()`	Diagnostic	Coverage within subgroups
`coverage_by_bin()`	Diagnostic	Coverage by prediction quantile bin
`interval_width()`	Diagnostic	Width of prediction intervals
`set_size()`	Diagnostic	Size of prediction sets
`conformal_pvalue()`	Diagnostic	Conformal p-values
`conformal_compare()`	Diagnostic	Compare multiple methods
`make_model()`	Utility	Wrap custom train/predict functions

More examples

Jackknife+ with ranger

Jackknife+ uses leave-one-out refitting to produce prediction intervals without splitting the data. This gives tighter intervals than split conformal (because it uses all the data for both training and calibration) at the cost of refitting the model n times.

set.seed(42)
n <- 200
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1]^2 + x[, 2] + rnorm(n, sd = 0.5)
x_new <- matrix(rnorm(50 * 3), ncol = 3)

rf <- make_model(
  train_fun = function(x, y) {
    ranger::ranger(y ~ ., data = data.frame(y = y, x))
  },
  predict_fun = function(object, x_new) {
    predict(object, data = as.data.frame(x_new))$predictions
  },
  type = "regression"
)

result <- conformal_jackknife(x, y, model = rf, x_new = x_new, alpha = 0.10)
print(result)
plot(result)

Normalised conformal for heteroscedastic data

When the noise varies across the input space (e.g. predictions are more uncertain at extreme values), standard conformal intervals are too wide in low-noise regions and too narrow in high-noise ones. Normalised conformal scoring fixes this by scaling residuals by a local estimate of variability.

set.seed(42)
n <- 500
x <- matrix(runif(n, 0, 10), ncol = 1)
y <- sin(x[, 1]) + rnorm(n, sd = 0.1 + 0.3 * x[, 1])  # noise grows with x
x_new <- matrix(seq(0, 10, length.out = 100), ncol = 1)

result <- conformal_split(
  x, y, model = y ~ ., x_new = x_new, alpha = 0.10,
  score_type = "normalized"
)

# Intervals are narrower near x = 0, wider near x = 10
plot(result)

Mondrian conformal: group-conditional coverage

Standard conformal prediction guarantees marginal coverage (across all test points), but coverage can vary wildly across subgroups. Mondrian conformal computes a separate quantile for each group, guaranteeing coverage within each subgroup. This is critical for fairness and regulatory compliance.

No other R package on CRAN implements Mondrian conformal prediction.

set.seed(42)
n <- 600
x <- matrix(rnorm(n * 3), ncol = 3)
groups <- factor(ifelse(x[, 1] > 0, "high", "low"))
y <- x[, 1] * 2 + ifelse(groups == "high", 3, 0.5) * rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
groups_new <- factor(ifelse(x_new[, 1] > 0, "high", "low"))

result <- conformal_mondrian(x, y, model = y ~ ., x_new = x_new,
                              groups = groups, groups_new = groups_new)
print(result)

# Check per-group coverage
y_new <- x_new[, 1] * 2 + ifelse(groups_new == "high", 3, 0.5) * rnorm(200)
coverage_by_group(result, y_new, groups_new)

Weighted conformal: handling covariate shift

When the test distribution differs from the training distribution (covariate shift), standard conformal coverage guarantees break down. Weighted conformal prediction uses importance weights to correct for this shift.

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] * 2 + rnorm(n)
x_new <- matrix(rnorm(100 * 3, mean = 1), ncol = 3)  # shifted test data

# Importance weights (likelihood ratio of test vs training distributions)
weights <- dnorm(x[, 1], mean = 1) / dnorm(x[, 1], mean = 0)

result <- conformal_weighted(x, y, model = y ~ ., x_new = x_new,
                              weights = weights)
print(result)

Conformalized Quantile Regression

CQR combines conformal prediction with quantile regression to produce intervals that naturally adapt to heteroscedasticity. Instead of fitting a model for the mean and adding symmetric bands, CQR fits models for the lower and upper quantiles and then adjusts them to guarantee coverage.

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] + x[, 2]^2 + rnorm(n, sd = 0.5 + abs(x[, 1]))
x_new <- matrix(rnorm(100 * 3), ncol = 3)

# In practice, use quantile regression (e.g. quantreg::rq).
# Here we approximate with shifted linear models for illustration.
model_lo <- make_model(
  train_fun = function(x, y) lm(y ~ ., data = data.frame(y = y, x)),
  predict_fun = function(obj, x_new) predict(obj, newdata = as.data.frame(x_new)) - 1.5,
  type = "regression"
)
model_hi <- make_model(
  train_fun = function(x, y) lm(y ~ ., data = data.frame(y = y, x)),
  predict_fun = function(obj, x_new) predict(obj, newdata = as.data.frame(x_new)) + 1.5,
  type = "regression"
)

result <- conformal_cqr(x, y, model_lo, model_hi, x_new = x_new, alpha = 0.10)
print(result)
plot(result)

Adaptive Conformal Inference (sequential prediction)

ACI adapts the miscoverage level online based on observed coverage, maintaining long-run coverage even under distribution shift. No other R package implements ACI.

set.seed(42)
n <- 500
y_true <- cumsum(rnorm(n, sd = 0.1)) + rnorm(n)  # drifting process
y_pred <- c(0, y_true[-n])                         # naive lag-1 predictor

result <- conformal_aci(y_pred, y_true, alpha = 0.10, gamma = 0.01)
print(result)
plot(result)  # intervals + adaptive alpha trace

Comparing methods

Benchmark multiple conformal methods side-by-side:

set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] * 2 + rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
y_new <- x_new[, 1] * 2 + rnorm(200)

comp <- conformal_compare(x, y, model = y ~ ., x_new = x_new, y_new = y_new,
                           methods = c("split", "cv", "jackknife"))
print(comp)

Diagnostics

After producing predictions, use the diagnostic functions to evaluate calibration and efficiency.

# Suppose y_test contains the true values for x_new
coverage(result, y_test)
#> [1] 0.91

# Average interval width (regression)
mean(interval_width(result))
#> [1] 3.42

# Prediction set sizes (classification)
table(set_size(result))
#>  1  2  3
#> 74 21  5

# Coverage within subgroups
coverage_by_group(result, y_test, groups = groups_test)
#>   group coverage   n target
#> 1  high    0.920  98    0.9
#> 2   low    0.891 102    0.9

# Coverage by prediction quantile bin
coverage_by_bin(result, y_test, bins = 5)

# Conformal p-values for outlier detection
pvals <- conformal_pvalue(result$scores, new_scores)

coverage() should be close to 1 - alpha. If it’s substantially lower, something has gone wrong (likely a violation of exchangeability). interval_width() and set_size() measure efficiency: narrower intervals and smaller sets are better, conditional on achieving the target coverage.

coverage_by_group() and coverage_by_bin() diagnose conditional coverage. Marginal coverage can mask severe under-coverage in subgroups (e.g. by demographic group or by prediction magnitude). These diagnostics help identify where intervals fail, and are essential for fairness evaluation.

Theory and references

Conformal prediction was introduced by Vovk, Gammerman, and Shafer in the early 2000s. The key insight is that if calibration and test data are exchangeable (i.e. their joint distribution is invariant to permutation), then the conformal p-value is uniformly distributed, which gives an exact finite-sample coverage guarantee. Unlike bootstrap or Bayesian intervals, this guarantee holds regardless of model misspecification.

The recent explosion of interest in conformal prediction has been driven by several methodological advances that make it practical for modern machine learning:

Vovk, Gammerman, Shafer (2005). Algorithmic Learning in a Random World. Springer. The foundational book introducing conformal prediction.
Lei, G’Sell, Rinaldo, Tibshirani, Wasserman (2018). Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523), 1094–1111. Formalises split conformal prediction.
Barber, Candes, Ramdas, Tibshirani (2021). Predictive inference with the Jackknife+. Annals of Statistics, 49(1), 486–507. Introduces Jackknife+ and CV+, which avoid the efficiency loss from data splitting.
Romano, Patterson, Candes (2019). Conformalized quantile regression. NeurIPS 2019. Combines quantile regression with conformal calibration for adaptive intervals.
Romano, Sesia, Candes (2020). Classification with valid and adaptive coverage. NeurIPS 2020. Introduces Adaptive Prediction Sets (APS) for classification.
Angelopoulos, Bates, Malik, Jordan (2021). Uncertainty sets for image classifiers using conformal prediction. ICLR 2021. Introduces Regularized APS (RAPS) to reduce set sizes.
Sadinle, Lei, Wasserman (2019). Least ambiguous set-valued classifiers with bounded error levels. Journal of the American Statistical Association, 114(525), 223–234. The LAC method for classification.
Tibshirani, Barber, Candes, Ramdas (2019). Conformal prediction under covariate shift. NeurIPS 2019. Weighted conformal prediction for distribution shift.
Gibbs, Candes (2021). Adaptive conformal inference under distribution shift. NeurIPS 2021. Online alpha adjustment for sequential prediction.

For an accessible introduction to the field, see Angelopoulos and Bates (2023), A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.

Limitations

Split methods halve the training data. Split conformal, APS, RAPS, and LAC all divide the data into a training set and a calibration set. With small datasets, this can noticeably reduce model quality. Jackknife+ and CV+ avoid this at the cost of refitting the model multiple times.
Jackknife+ and CV+ are computationally expensive. Jackknife+ refits the model n times; CV+ refits it K times. For large datasets or expensive models, this may be impractical.
The coverage guarantee requires exchangeability. If the calibration data and test data come from different distributions (for example, if there is temporal drift) the coverage guarantee does not hold. This means conformal prediction is not directly applicable to time series forecasting without modification (e.g. conformal methods for time series exist but are not implemented here).
Classification methods depend on probability estimates. APS, RAPS, and LAC require the model to output well-calibrated class probabilities. If the probabilities are poorly calibrated, the prediction sets will still have valid coverage but may be unnecessarily large.

Package	Description
probably	Conformal regression within the tidymodels ecosystem
conformalInference	Research code by Tibshirani et al. (2019, GitHub only)
ons	UK Office for National Statistics data
boe	Bank of England data
fred	Federal Reserve Economic Data (FRED)
readecb	European Central Bank data
readoecd	OECD data
inflateR	Inflation adjustment

Issues

Found a bug or have a feature request? Please open an issue on GitHub.