llmimpute: Missing Data Imputation via Language Models and Statistics
Provides missing data imputation through two complementary
engines: a large language model engine that communicates with the
'Anthropic' 'Claude' application programming interface for
context-aware semantic imputation, and a fully self-contained offline
engine implementing nineteen statistical and machine learning algorithms
entirely in base R with no additional package dependencies. Offline
methods include mean, median, mode, last observation carried forward,
next observation carried backward, hot-deck, predictive mean matching,
k-nearest neighbours, ordinary least-squares regression, Lasso with
coordinate descent, Ridge with closed-form solution, Bayesian Ridge
regression with evidence approximation following MacKay (1992), support
vector regression with a radial basis function kernel, classification
and regression trees, random forests, gradient boosting, iterative
random forest imputation, principal component analysis imputation via
iterative singular value decomposition, and nuclear-norm minimisation
via singular value thresholding. When no API key is available the
package automatically falls back to the offline engine, ensuring full
operation in environments without internet access. Every imputed value
is accompanied by a confidence score and a plain-language reasoning
string, producing reproducible audit trails. The automatic method
selector chooses the best algorithm per column based on data type,
skewness, missingness rate, and inter-column correlations.
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=llmimpute
to link to this page.