predictset is an R package for model-agnostic
conformal prediction and distribution-free uncertainty quantification.
It constructs prediction intervals (regression) and prediction sets
(classification) with finite-sample coverage guarantees - no
distributional assumptions required. Works with any model:
lm, glm, ranger,
xgboost, or custom user-defined models via
make_model().
# Install from CRAN
install.packages("predictset")
# Or install the development version from GitHub
# install.packages("devtools")
devtools::install_github("charlescoverdale/predictset")library(predictset)
# Get 90% prediction intervals around any model - 3 lines of code
result <- conformal_split(x, y, model = y ~ ., x_new = x_new, alpha = 0.10)
result$lower # lower bounds
result$upper # upper boundsStandard machine learning models produce point predictions: a single number for regression, a single class for classification. But in practice, you almost always need to know how uncertain that prediction is. Conformal prediction is a framework for wrapping any model in a layer of calibrated uncertainty quantification. Given a target coverage level (say 90%), it produces prediction intervals or prediction sets that are guaranteed to contain the true value at least 90% of the time, regardless of the underlying model or data distribution.
The key property is that this guarantee holds in finite samples. It’s not asymptotic, and it doesn’t require distributional assumptions. The only requirement is that the calibration data and test data are exchangeable (roughly: drawn from the same distribution). This makes conformal prediction fundamentally different from parametric confidence intervals, bootstrap intervals, or Bayesian credible intervals, all of which depend on modelling assumptions that may not hold.
predictset implements the main conformal methods
from the recent literature (split conformal, Jackknife+, CV+,
conformalized quantile regression for regression, and APS, RAPS, and LAC
for classification) in a lightweight package with only two dependencies
(cli and stats).
| Feature | predictset | probably | conformalInference | MAPIE |
|---|---|---|---|---|
| Language | R | R | R | Python |
| Regression | Yes | Yes | Yes | Yes |
| Classification | Yes | No | No | Yes |
| Model-agnostic | Yes | tidymodels only | Yes | scikit-learn only |
| On CRAN | Pending | Yes | No (GitHub only) | N/A |
| Jackknife+ / CV+ | Yes | No | Yes | Yes |
| CQR | Yes | Yes | Yes | Yes |
| APS / RAPS | Yes | No | No | Yes |
| Mondrian CP | Yes | No | No | Yes |
| Weighted CP | Yes | No | No | Yes |
| Adaptive CI | Yes | No | No | No |
| Conditional diagnostics | Yes | No | No | Partial |
| Dependencies | 2 | 14+ | 5 | N/A |
| Last updated | 2026 | 2024 | 2019 | 2024 |
predictset is designed to complement rather than
compete with probably. If you’re working in the tidymodels
ecosystem and only need regression intervals, probably
integrates neatly with your workflow. predictset fills
the gaps: classification methods (APS, RAPS, LAC), Jackknife+/CV+ for
regression, and a model-agnostic interface that works with any model,
not just tidymodels workflows.
conformalInference by Ryan Tibshirani was foundational
research code, but it hasn’t been updated since 2019, isn’t on CRAN, and
doesn’t cover classification.
library(predictset)
set.seed(42)
n <- 500
x <- matrix(rnorm(n * 5), ncol = 5)
y <- x[, 1] * 2 + x[, 2] + rnorm(n)
x_new <- matrix(rnorm(100 * 5), ncol = 5)
y_new <- x_new[, 1] * 2 + x_new[, 2] + rnorm(100) # true values (for evaluation)
# Fit conformal intervals
result <- conformal_split(x, y, model = y ~ ., x_new = x_new, alpha = 0.10)
print(result)
#> ── Conformal Prediction Intervals (Split Conformal) ──
#> • Coverage target: "90%"
#> • Training: 250 | Calibration: 250 | Predictions: 100
#> • Conformal quantile: 1.876
#> • Median interval width: 3.7519
# Extract intervals as a data frame
head(data.frame(pred = result$pred, lower = result$lower, upper = result$upper))
# Verify coverage: should be ≥ 90%
coverage(result, y_new)
# Predict on new data later (fit once, predict many times)
future_data <- matrix(rnorm(50 * 5), ncol = 5)
predict(result, newdata = future_data)set.seed(42)
n <- 400
x <- matrix(rnorm(n * 4), ncol = 4)
y <- factor(ifelse(x[, 1] + x[, 2] > 0, "A", "B"))
x_new <- matrix(rnorm(50 * 4), ncol = 4)
clf <- make_model(
train_fun = function(x, y) glm(y ~ ., data = data.frame(y = y, x),
family = "binomial"),
predict_fun = function(object, x_new) {
df <- as.data.frame(x_new)
names(df) <- paste0("X", seq_len(ncol(x_new)))
p <- predict(object, newdata = df, type = "response")
cbind(A = 1 - p, B = p)
},
type = "classification"
)
result <- conformal_aps(x, y, model = clf, x_new = x_new, alpha = 0.10)
result$sets[[1]] # prediction set for first observation: e.g. c("A")
result$sets[[10]] # ambiguous case might include: c("A", "B")
table(set_size(result)) # distribution of set sizes| Scenario | Recommended method | Why |
|---|---|---|
| Default for regression | conformal_split() |
Fast, single model fit |
| Small dataset, need tight intervals | conformal_cv() or
conformal_jackknife()* |
Uses all data for both training and calibration |
| Heteroscedastic data | conformal_split(..., score_type = "normalized") or
conformal_cqr() |
Adaptive interval widths |
| Default for classification | conformal_aps() |
Adaptive set sizes, well-calibrated |
| Many classes, want small sets | conformal_raps() |
Regularized APS, penalises large sets |
| Coverage must hold per subgroup | conformal_mondrian() /
conformal_mondrian_class() |
Group-conditional guarantees |
| Covariate shift between train/test | conformal_weighted() |
Importance-weighted calibration |
| Sequential/online prediction | conformal_aci()** |
Adapts to distribution drift over time |
*Jackknife+ and CV+ have a theoretical coverage guarantee of 1-2α (Barber et al. 2021), weaker than split conformal’s 1-α. In practice, coverage is typically near 1-α. **ACI provides asymptotic (not finite-sample) coverage guarantees.
There are three ways to specify a model. This flexibility means predictset works with anything from a simple linear model to a custom deep learning wrapper.
1. Formula shorthand (fits lm
internally):
result <- conformal_split(x, y, model = y ~ ., x_new = x_new)This is the quickest way to get started. Pass a formula and predictset handles the fitting.
2. Fitted model (auto-detected for lm,
glm, and ranger):
fit <- lm(y ~ ., data = data.frame(y = y, x))
result <- conformal_split(x, y, model = fit, x_new = x_new)If you’ve already fitted a model and want conformal intervals around its predictions, pass it directly. predictset recognises standard R model objects and extracts the training and prediction functions automatically.
3. Custom model via make_model() (works
with anything):
xgb_model <- make_model(
train_fun = function(x, y) {
dtrain <- xgboost::xgb.DMatrix(x, label = y)
xgboost::xgb.train(list(objective = "reg:squarederror"), dtrain, nrounds = 100)
},
predict_fun = function(object, x_new) {
predict(object, xgboost::xgb.DMatrix(x_new))
},
type = "regression"
)
result <- conformal_split(x, y, model = xgb_model, x_new = x_new)make_model() takes a training function, a prediction
function, and a type ("regression" or
"classification"). This is how you use conformal prediction
with xgboost, keras, lightgbm, or any other model.
| Function | Type | Method | Reference |
|---|---|---|---|
conformal_split() |
Regression | Split conformal | Vovk et al. (2005) |
conformal_cv() |
Regression | CV+ | Barber et al. (2021) |
conformal_jackknife() |
Regression | Jackknife+ | Barber et al. (2021) |
conformal_cqr() |
Regression | Conformalized Quantile Regression | Romano et al. (2019) |
conformal_mondrian() |
Regression | Mondrian (group-conditional) | Vovk et al. (2005) |
conformal_weighted() |
Regression | Weighted conformal (covariate shift) | Tibshirani et al. (2019) |
conformal_aps() |
Classification | Adaptive Prediction Sets | Romano, Sesia & Candes (2020) |
conformal_raps() |
Classification | Regularized APS | Angelopoulos et al. (2021) |
conformal_lac() |
Classification | Least Ambiguous Classifier | Sadinle, Lei & Wasserman (2019) |
conformal_mondrian_class() |
Classification | Mondrian (group-conditional) | Vovk et al. (2005) |
conformal_aci() |
Sequential | Adaptive Conformal Inference | Gibbs & Candes (2021) |
coverage() |
Diagnostic | Empirical coverage rate | |
coverage_by_group() |
Diagnostic | Coverage within subgroups | |
coverage_by_bin() |
Diagnostic | Coverage by prediction quantile bin | |
interval_width() |
Diagnostic | Width of prediction intervals | |
set_size() |
Diagnostic | Size of prediction sets | |
conformal_pvalue() |
Diagnostic | Conformal p-values | |
conformal_compare() |
Diagnostic | Compare multiple methods | |
make_model() |
Utility | Wrap custom train/predict functions |
Jackknife+ uses leave-one-out refitting to produce prediction intervals without splitting the data. This gives tighter intervals than split conformal (because it uses all the data for both training and calibration) at the cost of refitting the model n times.
set.seed(42)
n <- 200
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1]^2 + x[, 2] + rnorm(n, sd = 0.5)
x_new <- matrix(rnorm(50 * 3), ncol = 3)
rf <- make_model(
train_fun = function(x, y) {
ranger::ranger(y ~ ., data = data.frame(y = y, x))
},
predict_fun = function(object, x_new) {
predict(object, data = as.data.frame(x_new))$predictions
},
type = "regression"
)
result <- conformal_jackknife(x, y, model = rf, x_new = x_new, alpha = 0.10)
print(result)
plot(result)When the noise varies across the input space (e.g. predictions are more uncertain at extreme values), standard conformal intervals are too wide in low-noise regions and too narrow in high-noise ones. Normalised conformal scoring fixes this by scaling residuals by a local estimate of variability.
set.seed(42)
n <- 500
x <- matrix(runif(n, 0, 10), ncol = 1)
y <- sin(x[, 1]) + rnorm(n, sd = 0.1 + 0.3 * x[, 1]) # noise grows with x
x_new <- matrix(seq(0, 10, length.out = 100), ncol = 1)
result <- conformal_split(
x, y, model = y ~ ., x_new = x_new, alpha = 0.10,
score_type = "normalized"
)
# Intervals are narrower near x = 0, wider near x = 10
plot(result)Standard conformal prediction guarantees marginal coverage (across all test points), but coverage can vary wildly across subgroups. Mondrian conformal computes a separate quantile for each group, guaranteeing coverage within each subgroup. This is critical for fairness and regulatory compliance.
No other R package on CRAN implements Mondrian conformal prediction.
set.seed(42)
n <- 600
x <- matrix(rnorm(n * 3), ncol = 3)
groups <- factor(ifelse(x[, 1] > 0, "high", "low"))
y <- x[, 1] * 2 + ifelse(groups == "high", 3, 0.5) * rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
groups_new <- factor(ifelse(x_new[, 1] > 0, "high", "low"))
result <- conformal_mondrian(x, y, model = y ~ ., x_new = x_new,
groups = groups, groups_new = groups_new)
print(result)
# Check per-group coverage
y_new <- x_new[, 1] * 2 + ifelse(groups_new == "high", 3, 0.5) * rnorm(200)
coverage_by_group(result, y_new, groups_new)When the test distribution differs from the training distribution (covariate shift), standard conformal coverage guarantees break down. Weighted conformal prediction uses importance weights to correct for this shift.
set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] * 2 + rnorm(n)
x_new <- matrix(rnorm(100 * 3, mean = 1), ncol = 3) # shifted test data
# Importance weights (likelihood ratio of test vs training distributions)
weights <- dnorm(x[, 1], mean = 1) / dnorm(x[, 1], mean = 0)
result <- conformal_weighted(x, y, model = y ~ ., x_new = x_new,
weights = weights)
print(result)CQR combines conformal prediction with quantile regression to produce intervals that naturally adapt to heteroscedasticity. Instead of fitting a model for the mean and adding symmetric bands, CQR fits models for the lower and upper quantiles and then adjusts them to guarantee coverage.
set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] + x[, 2]^2 + rnorm(n, sd = 0.5 + abs(x[, 1]))
x_new <- matrix(rnorm(100 * 3), ncol = 3)
# In practice, use quantile regression (e.g. quantreg::rq).
# Here we approximate with shifted linear models for illustration.
model_lo <- make_model(
train_fun = function(x, y) lm(y ~ ., data = data.frame(y = y, x)),
predict_fun = function(obj, x_new) predict(obj, newdata = as.data.frame(x_new)) - 1.5,
type = "regression"
)
model_hi <- make_model(
train_fun = function(x, y) lm(y ~ ., data = data.frame(y = y, x)),
predict_fun = function(obj, x_new) predict(obj, newdata = as.data.frame(x_new)) + 1.5,
type = "regression"
)
result <- conformal_cqr(x, y, model_lo, model_hi, x_new = x_new, alpha = 0.10)
print(result)
plot(result)ACI adapts the miscoverage level online based on observed coverage, maintaining long-run coverage even under distribution shift. No other R package implements ACI.
set.seed(42)
n <- 500
y_true <- cumsum(rnorm(n, sd = 0.1)) + rnorm(n) # drifting process
y_pred <- c(0, y_true[-n]) # naive lag-1 predictor
result <- conformal_aci(y_pred, y_true, alpha = 0.10, gamma = 0.01)
print(result)
plot(result) # intervals + adaptive alpha traceBenchmark multiple conformal methods side-by-side:
set.seed(42)
n <- 500
x <- matrix(rnorm(n * 3), ncol = 3)
y <- x[, 1] * 2 + rnorm(n)
x_new <- matrix(rnorm(200 * 3), ncol = 3)
y_new <- x_new[, 1] * 2 + rnorm(200)
comp <- conformal_compare(x, y, model = y ~ ., x_new = x_new, y_new = y_new,
methods = c("split", "cv", "jackknife"))
print(comp)After producing predictions, use the diagnostic functions to evaluate calibration and efficiency.
# Suppose y_test contains the true values for x_new
coverage(result, y_test)
#> [1] 0.91
# Average interval width (regression)
mean(interval_width(result))
#> [1] 3.42
# Prediction set sizes (classification)
table(set_size(result))
#> 1 2 3
#> 74 21 5
# Coverage within subgroups
coverage_by_group(result, y_test, groups = groups_test)
#> group coverage n target
#> 1 high 0.920 98 0.9
#> 2 low 0.891 102 0.9
# Coverage by prediction quantile bin
coverage_by_bin(result, y_test, bins = 5)
# Conformal p-values for outlier detection
pvals <- conformal_pvalue(result$scores, new_scores)coverage() should be close to 1 - alpha. If
it’s substantially lower, something has gone wrong (likely a violation
of exchangeability). interval_width() and
set_size() measure efficiency: narrower intervals and
smaller sets are better, conditional on achieving the target
coverage.
coverage_by_group() and coverage_by_bin()
diagnose conditional coverage. Marginal coverage can mask severe
under-coverage in subgroups (e.g. by demographic group or by prediction
magnitude). These diagnostics help identify where intervals fail, and
are essential for fairness evaluation.
Conformal prediction was introduced by Vovk, Gammerman, and Shafer in the early 2000s. The key insight is that if calibration and test data are exchangeable (i.e. their joint distribution is invariant to permutation), then the conformal p-value is uniformly distributed, which gives an exact finite-sample coverage guarantee. Unlike bootstrap or Bayesian intervals, this guarantee holds regardless of model misspecification.
The recent explosion of interest in conformal prediction has been driven by several methodological advances that make it practical for modern machine learning:
For an accessible introduction to the field, see Angelopoulos and Bates (2023), A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.
| Package | Description |
|---|---|
| probably | Conformal regression within the tidymodels ecosystem |
| conformalInference | Research code by Tibshirani et al. (2019, GitHub only) |
| ons | UK Office for National Statistics data |
| boe | Bank of England data |
| fred | Federal Reserve Economic Data (FRED) |
| readecb | European Central Bank data |
| readoecd | OECD data |
| inflateR | Inflation adjustment |
Found a bug or have a feature request? Please open an issue on GitHub.