| Type: | Package | 
| Title: | Resampling-Based Prediction Error Estimation for Regression Models | 
| Version: | 0.3.1 | 
| Date: | 2021-11-03 | 
| Depends: | R (≥ 2.14.1), ggplot2 (≥ 0.9.2), parallel | 
| Imports: | stats, utils | 
| Suggests: | perryExamples, robustbase | 
| Description: | Tools that allow developers to write functions for prediction error estimation with minimal programming effort and assist users with model selection in regression problems. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| LazyLoad: | yes | 
| Author: | Andreas Alfons [aut, cre] | 
| Maintainer: | Andreas Alfons <alfons@ese.eur.nl> | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.1.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2021-11-03 10:23:35 UTC; andreas | 
| Repository: | CRAN | 
| Date/Publication: | 2021-11-03 11:20:05 UTC | 
Resampling-Based Prediction Error Estimation for Regression Models
Description
Tools that allow developers to write functions for prediction error estimation with minimal programming effort and assist users with model selection in regression problems.
Details
The DESCRIPTION file:
| Package: | perry | 
| Type: | Package | 
| Title: | Resampling-Based Prediction Error Estimation for Regression Models | 
| Version: | 0.3.1 | 
| Date: | 2021-11-03 | 
| Depends: | R (>= 2.14.1), ggplot2 (>= 0.9.2), parallel | 
| Imports: | stats, utils | 
| Suggests: | perryExamples, robustbase | 
| Description: | Tools that allow developers to write functions for prediction error estimation with minimal programming effort and assist users with model selection in regression problems. | 
| License: | GPL (>= 2) | 
| LazyLoad: | yes | 
| Authors@R: | person("Andreas", "Alfons", email = "alfons@ese.eur.nl", role = c("aut", "cre")) | 
| Author: | Andreas Alfons [aut, cre] | 
| Maintainer: | Andreas Alfons <alfons@ese.eur.nl> | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.1.2 | 
Index of help topics:
accessors               Access or set information on resampling-based
                        prediction error results
aggregate.perry         Aggregate resampling-based prediction error
                        results
bootControl             Control object for bootstrap samples
bootSamples             Bootstrap samples
cost                    Prediction loss
cvFolds                 Cross-validation folds
foldControl             Control object for cross-validation folds
perry                   Resampling-based prediction error for fitted
                        models
perry-deprecated        Deprecated functions in package 'perry'
perry-package           Resampling-Based Prediction Error Estimation
                        for Regression Models
perryFit                Resampling-based prediction error for model
                        evaluation
perryPlot               Plot resampling-based prediction error results
perryReshape            Reshape resampling-based prediction error
                        results
perrySelect             Model selection via resampling-based prediction
                        error measures
perrySplits             Data splits for resampling-based prediction
                        error measures
perryTuning             Resampling-based prediction error for tuning
                        parameter selection
randomSplits            Random data splits
reperry                 Recompute resampling-based prediction error
                        measures
setupPerryPlot          Set up a plot of resampling-based prediction
                        error results
splitControl            Control object for random data splits
subset.perry            Subsetting resampling-based prediction error
                        results
summary.perry           Summarize resampling-based prediction error
                        results
Author(s)
Andreas Alfons [aut, cre]
Maintainer: Andreas Alfons <alfons@ese.eur.nl>
Access or set information on resampling-based prediction error results
Description
Retrieve or set the names of resampling-based prediction error results, retrieve or set the identifiers of the models, or retrieve the number of prediction error results or included models.
Usage
peNames(x)
peNames(x) <- value
fits(x)
fits(x) <- value
npe(x)
nfits(x)
Arguments
| x | an object inheriting from class  | 
| value | a vector of replacement values. | 
Value
peNames returns the names of the prediction error results.  The
replacement function thereby returns them invisibly.
fits returns the identifiers of the models for objects inheriting
from class "perrySelect" and NULL for objects inheriting from
class "perry".  The replacement function thereby returns those values
invisibly.
npe returns the number of prediction error results.
nfits returns the number of models included in objects inheriting
from class "perrySelect" and NULL for objects inheriting from
class "perry".
Author(s)
Andreas Alfons
See Also
perryFit, perrySelect,
perryTuning
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare raw and reweighted LTS estimators for
## 50% and 75% subsets
# 50% subsets
fit50 <- ltsReg(Y ~ ., data = coleman, alpha = 0.5)
cv50 <- perry(fit50, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# 75% subsets
fit75 <- ltsReg(Y ~ ., data = coleman, alpha = 0.75)
cv75 <- perry(fit75, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# combine results into one object
cv <- perrySelect("0.5" = cv50, "0.75" = cv75)
cv
# "perry" object
npe(cv50)
nfits(cv50)
peNames(cv50)
peNames(cv50) <- c("improved", "initial")
fits(cv50)
cv50
# "perrySelect" object
npe(cv)
nfits(cv)
peNames(cv)
peNames(cv) <- c("improved", "initial")
fits(cv)
fits(cv) <- 1:2
cv
Aggregate resampling-based prediction error results
Description
Compute summary statistics of resampling-based prediction error results.
Usage
## S3 method for class 'perry'
aggregate(x, FUN = mean, select = NULL, ...)
## S3 method for class 'perrySelect'
aggregate(x, FUN = mean, select = NULL, ...)
## S3 method for class 'perryTuning'
aggregate(x, ...)
Arguments
| x | an object inheriting from class  | 
| FUN | a function to compute the summary statistics. | 
| select | a character, integer or logical vector indicating the columns of prediction error results for which to compute the summary statistics. | 
| ... | for the  | 
Value
The "perry" method returns a vector or matrix of aggregated 
prediction error results, depending on whether FUN returns a single 
value or a vector.
For the other methods, a data frame containing the aggregated 
prediction error results for each model is returned.  In the case of the 
"perryTuning" method, the data frame contains the combinations of 
tuning parameters rather than a column describing the models.
Note
Duplicate indices in subset or select are removed such 
that all models and prediction error results are unique.
Author(s)
Andreas Alfons
See Also
perryFit, perrySelect, 
perryTuning, aggregate
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare raw and reweighted LTS estimators for
## 50% and 75% subsets
# 50% subsets
fit50 <- ltsReg(Y ~ ., data = coleman, alpha = 0.5)
cv50 <- perry(fit50, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# 75% subsets
fit75 <- ltsReg(Y ~ ., data = coleman, alpha = 0.75)
cv75 <- perry(fit75, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# combine results into one object
cv <- perrySelect("0.5" = cv50, "0.75" = cv75)
cv
# summary of the results with the 50% subsets
aggregate(cv50, summary)
# summary of the combined results
aggregate(cv, summary)
Control object for bootstrap samples
Description
Generate an object that controls how to draw bootstrap samples and which bootstrap estimator of prediction error to compute.
Usage
bootControl(R = 1, type = c("0.632", "out-of-bag"), grouping = NULL)
Arguments
| R | an integer giving the number of bootstrap samples. | 
| type | a character string specifying a bootstrap estimator.  Possible
values are  | 
| grouping | a factor specifying groups of observations. | 
Value
An object of class "bootControl" with the following
components:
- R
- an integer giving the number of bootstrap samples. 
- type
- a character string specifying the type of bootstrap estimator. 
- grouping
- if supplied, a factor specifying groups of observations. The groups will then be resampled rather than individual observations such that all observations within a group belong either to the bootstrap sample or the test data. 
Author(s)
Andreas Alfons
References
Efron, B. (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382), 316–331.
See Also
perrySplits, bootSamples,
foldControl, splitControl
Examples
set.seed(1234)  # set seed for reproducibility
perrySplits(20, bootControl())
perrySplits(20, bootControl(R = 10))
Bootstrap samples
Description
Draw bootstrap samples of observations or groups of observations and specify which bootstrap estimator of prediction error to compute.
Usage
bootSamples(n, R = 1, type = c("0.632", "out-of-bag"), grouping = NULL)
Arguments
| n | an integer giving the number of observations for which to draw
bootstrap samples.  This is ignored if  | 
| R | an integer giving the number of bootstrap samples. | 
| type | a character string specifying a bootstrap estimator.  Possible
values are  | 
| grouping | a factor specifying groups of observations. If supplied, the groups are resampled rather than individual observations such that all observations within a group belong either to the bootstrap sample or the test data. | 
Value
An object of class "bootSamples" with the following
components:
- n
- an integer giving the number of observations or groups. 
- R
- an integer giving the number of bootstrap samples. 
- subsets
- an integer matrix in which each column contains the indices of the observations or groups in the corresponding bootstrap sample. 
- grouping
- a list giving the indices of the observations belonging to each group. This is only returned if a grouping factor has been supplied. 
Note
This is a simple wrapper function for perrySplits with a
control object generated by bootControl.
Author(s)
Andreas Alfons
References
Efron, B. (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382), 316–331.
See Also
perrySplits, bootControl,
cvFolds, randomSplits
Examples
set.seed(1234)  # set seed for reproducibility
bootSamples(20)
bootSamples(20, R = 10)
Prediction loss
Description
Compute the prediction loss of a model.
Usage
mspe(y, yHat, includeSE = FALSE)
rmspe(y, yHat, includeSE = FALSE)
mape(y, yHat, includeSE = FALSE)
tmspe(y, yHat, trim = 0.25, includeSE = FALSE)
rtmspe(y, yHat, trim = 0.25, includeSE = FALSE)
Arguments
| y | a numeric vector or matrix giving the observed values. | 
| yHat | a numeric vector or matrix of the same dimensions as  | 
| includeSE | a logical indicating whether standard errors should be computed as well. | 
| trim | a numeric value giving the trimming proportion (the default is 0.25). | 
Details
mspe and rmspe compute the mean squared prediction error and
the root mean squared prediction error, respectively.  In addition,
mape returns the mean absolute prediction error, which is somewhat
more robust.
Robust prediction loss based on trimming is implemented in tmspe and
rtmspe.  To be more precise, tmspe computes the trimmed mean
squared prediction error and rtmspe computes the root trimmed mean
squared prediction error.  A proportion of the largest squared differences
of the observed and fitted values are thereby trimmed.
Standard errors can be requested via the includeSE argument.  Note that
standard errors for tmspe are based on a winsorized standard
deviation.  Furthermore, standard errors for rmspe and rtmspe
are computed from the respective standard errors of mspe and
tmspe via the delta method.
Value
If standard errors are not requested, a numeric value giving the prediction loss is returned.
Otherwise a list is returned, with the first component containing the prediction loss and the second component the corresponding standard error.
Author(s)
Andreas Alfons
References
Tukey, J.W. and McLaughlin, D.H. (1963) Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization. Sankhya: The Indian Journal of Statistics, Series A, 25(3), 331–352
Oehlert, G.W. (1992) A note on the delta method. The American Statistician, 46(1), 27–29.
See Also
Examples
# fit an MM-regression model
library("robustbase")
data("coleman")
fit <- lmrob(Y~., data=coleman)
# compute the prediction loss from the fitted values
# (hence the prediction loss is underestimated in this simple
# example since all observations are used to fit the model)
mspe(coleman$Y, predict(fit))
rmspe(coleman$Y, predict(fit))
mape(coleman$Y, predict(fit))
tmspe(coleman$Y, predict(fit), trim = 0.1)
rtmspe(coleman$Y, predict(fit), trim = 0.1)
# include standard error
mspe(coleman$Y, predict(fit), includeSE = TRUE)
rmspe(coleman$Y, predict(fit), includeSE = TRUE)
mape(coleman$Y, predict(fit), includeSE = TRUE)
tmspe(coleman$Y, predict(fit), trim = 0.1, includeSE = TRUE)
rtmspe(coleman$Y, predict(fit), trim = 0.1, includeSE = TRUE)
Cross-validation folds
Description
Split observations or groups of observations into K folds to be used
for (repeated) K-fold cross-validation.  K should thereby be
chosen such that all folds are of approximately equal size.
Usage
cvFolds(
  n,
  K = 5,
  R = 1,
  type = c("random", "consecutive", "interleaved"),
  grouping = NULL
)
Arguments
| n | an integer giving the number of observations to be split into
folds.  This is ignored if  | 
| K | an integer giving the number of folds into which the observations
should be split (the default is five).  Setting  | 
| R | an integer giving the number of replications for repeated
 | 
| type | a character string specifying the type of folds to be
generated.  Possible values are  | 
| grouping | a factor specifying groups of observations. If supplied, the data are split according to the groups rather than individual observations such that all observations within a group belong to the same fold. | 
Value
An object of class "cvFolds" with the following components:
- n
- an integer giving the number of observations or groups. 
- K
- an integer giving the number of folds. 
- R
- an integer giving the number of replications. 
- subsets
- an integer matrix in which each column contains a permutation of the indices of the observations or groups. 
- which
- an integer vector giving the fold for each permuted observation or group. 
- grouping
- a list giving the indices of the observations belonging to each group. This is only returned if a grouping factor has been supplied. 
Note
This is a simple wrapper function for perrySplits with a
control object generated by foldControl.
Author(s)
Andreas Alfons
See Also
perrySplits, foldControl,
randomSplits, bootSamples
Examples
set.seed(1234)  # set seed for reproducibility
cvFolds(20, K = 5, type = "random")
cvFolds(20, K = 5, type = "consecutive")
cvFolds(20, K = 5, type = "interleaved")
cvFolds(20, K = 5, R = 10)
Control object for cross-validation folds
Description
Generate an object that controls how to split n observations or
groups of observations into K folds to be used for (repeated)
K-fold cross-validation.  K should thereby be chosen such that
all folds are of approximately equal size.
Usage
foldControl(
  K = 5,
  R = 1,
  type = c("random", "consecutive", "interleaved"),
  grouping = NULL
)
Arguments
| K | an integer giving the number of folds into which the observations should be split (the default is five). | 
| R | an integer giving the number of replications for repeated
 | 
| type | a character string specifying the type of folds to be
generated.  Possible values are  | 
| grouping | a factor specifying groups of observations. | 
Value
An object of class "foldControl" with the following
components:
- K
- an integer giving the number of folds. A value of - Kequal to the number of observations or groups yields eave-one-out cross-validation.
- R
- an integer giving the number of replications. This will be ignored for for leave-one-out cross-validation and other non-random splits of the data. 
- type
- a character string specifying the type of folds. 
- grouping
- if supplied, a factor specifying groups of observations. The data will then be split according to the groups rather than individual observations such that all observations within a group belong to the same fold. 
Author(s)
Andreas Alfons
See Also
perrySplits, cvFolds,
splitControl, bootControl
Examples
set.seed(1234)  # set seed for reproducibility
perrySplits(20, foldControl(K = 5))
perrySplits(20, foldControl(K = 5, R = 10))
Resampling-based prediction error for fitted models
Description
Generic function to estimate the prediction error of a fitted model via 
(repeated) K-fold cross-validation, (repeated) random splitting (also 
known as random subsampling or Monte Carlo cross-validation), or the 
bootstrap.
Usage
perry(object, ...)
Arguments
| object | the fitted model for which to estimate the prediction error. | 
| ... | additional arguments to be passed down to methods. | 
Details
The idea is that developers write easy-to-use methods for end users to 
leverage the prediction error estimation framework for their models.  A 
typical perry method consists of the following two parts: first the 
data are extracted from the model, then function perryFit is 
called to perform prediction error estimation.  The programming effort of 
implementing prediction error estimation for a certain model is thus greatly 
reduced.
Examples for methods are available in package perryExamples (see 
perry-methods).
Author(s)
Andreas Alfons
See Also
Deprecated functions in package perry
Description
These functions are provided for compatibility with older versions only, and may be defunct as soon as the next release.
Usage
## S3 method for class 'perry'
fortify(
  model,
  data,
  select = NULL,
  reps = model$splits$R > 1,
  seFactor = NA,
  ...
)
## S3 method for class 'perrySelect'
fortify(
  model,
  data,
  subset = NULL,
  select = NULL,
  reps = model$splits$R > 1,
  seFactor = model$seFactor,
  ...
)
## S3 method for class 'perryTuning'
fortify(model, data, ...)
## Default S3 method:
perryPlot(
  object,
  method = c("box", "density", "dot", "line"),
  mapping,
  facets = attr(object, "facets"),
  ...
)
Arguments
| model | an object inheriting from class  | 
| data | currently ignored. | 
| select | a character, integer or logical vector indicating the columns of prediction error results to be converted. | 
| reps | a logical indicating whether to convert the results from all
replications ( | 
| seFactor | a numeric value giving the multiplication factor of the
standard error for displaying error bars in dot plots or line plots.  Error
bars in those plots can be suppressed by setting this to  | 
| ... | for the  | 
| subset | a character, integer or logical vector indicating the subset of models to be converted. | 
| object | an object inheriting from class  | 
| method | a character string specifying the type of plot.  Possible
values are  | 
| mapping | an aesthetic mapping to override the default behavior (see
 | 
| facets | a faceting formula to override the default behavior.  If
supplied,  | 
Details
The fortify methods extract all necessary information for plotting
from resampling-based prediction error results and store it in a data frame.
The default method of perryPlot creates the corresponding plot from
the data frame returned by fortify.
Value
The fortify methods return a data frame containing the
columns listed below, as well as additional information stored in the
attribute "facets" (default faceting formula for the plots).
- Fit
- a vector or factor containing the identifiers of the models. 
- Name
- a factor containing the names of the predictor error results (not returned in case of only one column of prediction error results with the default name). 
- PE
- the estimated prediction errors. 
- Lower
- the lower end points of the error bars (only returned if - repsis- FALSE).
- Upper
- the upper end points of the error bars (only returned if - repsis- FALSE).
Note
Duplicate indices in subset or select are removed such
that all models and prediction error results are unique.
Author(s)
Andreas Alfons
Resampling-based prediction error for model evaluation
Description
Estimate the prediction error of a model via (repeated) K-fold
cross-validation, (repeated) random splitting (also known as random
subsampling or Monte Carlo cross-validation), or the bootstrap.  It is
thereby possible to supply an object returned by a model fitting function,
a model fitting function itself, or an unevaluated function call to a model
fitting function.
Usage
perryFit(object, ...)
## Default S3 method:
perryFit(
  object,
  data = NULL,
  x = NULL,
  y,
  splits = foldControl(),
  predictFun = predict,
  predictArgs = list(),
  cost = rmspe,
  costArgs = list(),
  names = NULL,
  envir = parent.frame(),
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)
## S3 method for class ''function''
perryFit(
  object,
  formula,
  data = NULL,
  x = NULL,
  y,
  args = list(),
  splits = foldControl(),
  predictFun = predict,
  predictArgs = list(),
  cost = rmspe,
  costArgs = list(),
  names = NULL,
  envir = parent.frame(),
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)
## S3 method for class 'call'
perryFit(
  object,
  data = NULL,
  x = NULL,
  y,
  splits = foldControl(),
  predictFun = predict,
  predictArgs = list(),
  cost = rmspe,
  costArgs = list(),
  names = NULL,
  envir = parent.frame(),
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)
Arguments
| object | the fitted model for which to estimate the prediction error,
a function for fitting a model, or an unevaluated function call for fitting
a model (see  | 
| ... | additional arguments to be passed down. | 
| data | a data frame containing the variables required for fitting the
models.  This is typically used if the model in the function call is
described by a  | 
| x | a numeric matrix containing the predictor variables. This is typically used if the function call for fitting the models requires the predictor matrix and the response to be supplied as separate arguments. | 
| y | a numeric vector or matrix containing the response. | 
| splits | an object of class  | 
| predictFun | a function to compute predictions for the test data.  It
should expect the fitted model to be passed as the first argument and the test
data as the second argument, and must return either a vector or a matrix
containing the predicted values.  The default is to use the
 | 
| predictArgs | a list of additional arguments to be passed to
 | 
| cost | a cost function measuring prediction loss.  It should expect
the observed values of the response to be passed as the first argument and
the predicted values as the second argument, and must return either a
non-negative scalar value, or a list with the first component containing
the prediction error and the second component containing the standard
error.  The default is to use the root mean squared prediction error
(see  | 
| costArgs | a list of additional arguments to be passed to the
prediction loss function  | 
| names | an optional character vector giving names for the arguments containing the data to be used in the function call (see “Details”). | 
| envir | the  | 
| ncores | a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization).  If
this is set to  | 
| cl | a parallel cluster for parallel computing as generated by
 | 
| seed | optional initial seed for the random number generator (see
 | 
| formula | a  | 
| args | a list of additional arguments to be passed to the model fitting function. | 
Details
(Repeated) K-fold cross-validation is performed in the following
way.  The data are first split into K previously obtained blocks of
approximately equal size (given by folds).  Each of the K data
blocks is left out once to fit the model, and predictions are computed for
the observations in the left-out block with predictFun.  Thus a
prediction is obtained for each observation.  The response and the obtained
predictions for all observations are then passed to the prediction loss
function cost to estimate the prediction error.  For repeated
K-fold cross-validation (as indicated by splits), this process
is replicated and the estimated prediction errors from all replications are
returned.
(Repeated) random splitting is performed similarly.  In each replication,
the data are split into a training set and a test set at random.  Then the
training data are used to fit the model, and predictions are computed for
the test data.  Hence only the response values from the test data and the
corresponding predictions are passed to the prediction loss function
cost.
For the bootstrap estimator, each bootstrap sample is used as training data
to fit the model.  The out-of-bag estimator uses the observations that do
not enter the bootstrap sample as test data and computes the prediction loss
function cost for those out-of-bag observations.  The 0.632 estimator
is computed as a linear combination of the out-of-bag estimator and the
prediction loss of the fitted values of the model computed from the full
sample.
In any case, if the response is a vector but predictFun returns a
matrix, the prediction error is computed for each column.  A typical use
case for this behavior would be if predictFun returns predictions
from an initial model fit and stepwise improvements thereof.
If formula or data are supplied, all variables required for
fitting the models are added as one argument to the function call, which is
the typical behavior of model fitting functions with a
formula interface.  In this case, the accepted values
for names depend on the method.  For the function method, a
character vector of length two should supplied, with the first element
specifying the argument name for the formula and the second element
specifying the argument name for the data (the default is to use
c("formula", "data")).  Note that names for both arguments should be
supplied even if only one is actually used.  For the other methods, which do
not have a formula argument, a character string specifying the
argument name for the data should be supplied (the default is to use
"data").
If x is supplied, on the other hand, the predictor matrix and the
response are added as separate arguments to the function call.  In this
case, names should be a character vector of length two, with the
first element specifying the argument name for the predictor matrix and the
second element specifying the argument name for the response (the default is
to use c("x", "y")).  It should be noted that the formula or
data arguments take precedence over x.
Value
An object of class "perry" with the following components:
- pe
- a numeric vector containing the respective estimated prediction errors. In case of more than one replication, those are average values over all replications. 
- se
- a numeric vector containing the respective estimated standard errors of the prediction loss. 
- reps
- a numeric matrix in which each column contains the respective estimated prediction errors from all replications. This is only returned in case of more than one replication. 
- splits
- an object giving the data splits used to estimate the prediction error. 
- y
- the response. 
- yHat
- a list containing the predicted values from all replications. 
- call
- the matched function call. 
Author(s)
Andreas Alfons
See Also
perrySelect, perryTuning,
cvFolds, randomSplits,
bootSamples, cost
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## via model fit
# fit an MM regression model
fit <- lmrob(Y ~ ., data=coleman)
# perform cross-validation
perryFit(fit, data = coleman, y = coleman$Y,
         splits = foldControl(K = 5, R = 10),
         cost = rtmspe, costArgs = list(trim = 0.1),
         seed = 1234)
## via model fitting function
# perform cross-validation
# note that the response is extracted from 'data' in
# this example and does not have to be supplied
perryFit(lmrob, formula = Y ~ ., data = coleman,
         splits = foldControl(K = 5, R = 10),
         cost = rtmspe, costArgs = list(trim = 0.1),
         seed = 1234)
## via function call
# set up function call
call <- call("lmrob", formula = Y ~ .)
# perform cross-validation
perryFit(call, data = coleman, y = coleman$Y,
         splits = foldControl(K = 5, R = 10),
         cost = rtmspe, costArgs = list(trim = 0.1),
         seed = 1234)
Plot resampling-based prediction error results
Description
Plot results of resampling-based prediction error measures.
Usage
perryPlot(object, ...)
## S3 method for class 'perry'
perryPlot(
  object,
  which = c("box", "density", "dot"),
  select = NULL,
  seFactor = NA,
  ...
)
## S3 method for class 'perrySelect'
perryPlot(
  object,
  which = c("box", "density", "dot", "line"),
  subset = NULL,
  select = NULL,
  seFactor = object$seFactor,
  ...
)
## S3 method for class 'setupPerryPlot'
perryPlot(object, mapping = object$mapping, facets = object$facets, ...)
## S3 method for class 'perry'
autoplot(object, ...)
## S3 method for class 'perrySelect'
autoplot(object, ...)
## S3 method for class 'perry'
plot(x, ...)
## S3 method for class 'perrySelect'
plot(x, ...)
Arguments
| object,x | an object inheriting from class  | 
| ... | additional arguments to be passed down, eventually to
 | 
| which | a character string specifying the type of plot.  Possible
values are  | 
| select | a character, integer or logical vector indicating the columns of prediction error results to be plotted. | 
| seFactor | a numeric value giving the multiplication factor of the
standard error for displaying error bars in dot plots or line plots.  Error
bars in those plots can be suppressed by setting this to  | 
| subset | a character, integer or logical vector indicating the subset of models for which to plot the prediction error results. | 
| mapping | an aesthetic mapping to override the default behavior (see
 | 
| facets | a faceting formula to override the default behavior.  If
supplied,  | 
Details
For objects with multiple columns of prediction error results, conditional plots are produced.
Value
An object of class "ggplot" (see ggplot).
Note
Duplicate indices in subset or select are removed such
that all models and prediction error results are unique.
Author(s)
Andreas Alfons
See Also
perryFit, perrySelect,
perryTuning,
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare LS, MM and LTS regression
# perform cross-validation for an LS regression model
fitLm <- lm(Y ~ ., data = coleman)
cvLm <- perry(fitLm, splits = folds,
              cost = rtmspe, trim = 0.1)
# perform cross-validation for an MM regression model
fitLmrob <- lmrob(Y ~ ., data = coleman, maxit.scale = 500)
cvLmrob <- perry(fitLmrob, splits = folds,
                 cost = rtmspe, trim = 0.1)
# perform cross-validation for an LTS regression model
fitLts <- ltsReg(Y ~ ., data = coleman)
cvLts <- perry(fitLts, splits = folds,
               cost = rtmspe, trim = 0.1)
# combine results into one object
cv <- perrySelect(LS = cvLm, MM = cvLmrob, LTS = cvLts,
                  .selectBest = "min")
cv
# plot results for the MM regression model
plot(cvLmrob, which = "box")
plot(cvLmrob, which = "density")
plot(cvLmrob, which = "dot", seFactor = 2)
# plot combined results
plot(cv, which = "box")
plot(cv, which = "density")
plot(cv, which = "dot", seFactor = 2)
Reshape resampling-based prediction error results
Description
Reshape resampling-based prediction error results into an object of class
"perrySelect" with only one column of results.
Usage
perryReshape(
  x,
  selectBest = c("min", "hastie"),
  seFactor = 1,
  tuning = list(),
  ...
)
Arguments
| x | an object inheriting from class  | 
| selectBest | a character string specifying a criterion for selecting
the best model.  Possible values are  | 
| seFactor | a numeric value giving a multiplication factor of the
standard error for the selection of the best model.  This is ignored if
 | 
| tuning | a list of tuning parameter values that correspond to the different prediction error results. The names of the list components should thereby correspond to the argument names of the tuning parameters. For each tuning parameter, a vector of values can be supplied. A data frame containing all possible combinations of tuning parameter values is then added to the reshaped prediction error results. | 
| ... | additional arguments to be passed down. | 
Value
An object of class "perrySelect" (subclass
"perryTuning" if a list of tuning parameters is supplied) with the
following components:
- pe
- a data frame containing the estimated prediction errors for the models. In case of more than one resampling replication, those are average values over all replications. 
- se
- a data frame containing the estimated standard errors of the prediction loss for the models. 
- reps
- a data frame containing the estimated prediction errors for the models from all replications. This is only returned in case of more than one resampling replication. 
- splits
- an object giving the data splits used to estimate the prediction error. 
- y
- the response. 
- yHat
- a list containing the predicted values for the models. Each list component is again a list containing the corresponding predicted values from all replications. 
- best
- an integer giving the index of the model with the best prediction performance. 
- selectBest
- a character string specifying the criterion used for selecting the best model. 
- seFactor
- a numeric value giving the multiplication factor of the standard error used for the selection of the best model. 
- tuning
- a data frame containing the grid of tuning parameter values that correspond to the different prediction error results (only subclass - "perryTuning").
Author(s)
Andreas Alfons
References
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition.
See Also
perryFit, perrySelect,
perryTuning
Examples
library("perryExamples")
data("coleman")
# perform cross-validation for an LTS regression model
fit <- ltsReg(Y ~ ., data = coleman)
folds <- foldControl(K = 5, R = 10)
cv <- perry(fit, splits = folds, fit = "both",
            cost = rtmspe, trim = 0.1, seed = 1234)
# compare original and reshaped object
cv
perryReshape(cv)
Model selection via resampling-based prediction error measures
Description
Combine resampling-based prediction error results for various models into one object and select the model with the best prediction performance.
Usage
perrySelect(
  ...,
  .list = list(...),
  .reshape = FALSE,
  .selectBest = c("min", "hastie"),
  .seFactor = 1
)
Arguments
| ... | objects inheriting from class  | 
| .list | a list of objects  inheriting from class  | 
| .reshape | a logical indicating whether objects with more than one column of prediction error results should be reshaped to have only one column (see “Details”). | 
| .selectBest | a character string specifying a criterion for selecting
the best model.  Possible values are  | 
| .seFactor | a numeric value giving a multiplication factor of the
standard error for the selection of the best model.  This is ignored if
 | 
Details
Keep in mind that objects inheriting from class "perry" or
"perrySelect" may contain multiple columns of prediction error
results.  This is the case if the response is univariate but the
function to compute predictions (usually the predict
method of the fitted model) returns a matrix.
The .reshape argument determines how to handle such objects.  If
.reshape is FALSE, all objects are required to have the same
number of columns and the best model for each column is selected.  A typical
use case for this behavior would be if the investigated models contain
prediction error results for a raw and a reweighted fit.  It might then be
of interest to researchers to compare the best model for the raw estimators
with the best model for the reweighted estimators.
If .reshape is TRUE, objects with more than one column of
results are first transformed with perryReshape to have only
one column.  Then the best overall model is selected.
It should also be noted that the argument names of .list,
.reshape, .selectBest and .seFacor start with a dot to
avoid conflicts with the argument names used for the objects containing
prediction error results.
Value
An object of class "perrySelect" with the following
components:
- pe
- a data frame containing the estimated prediction errors for the models. In case of more than one resampling replication, those are average values over all replications. 
- se
- a data frame containing the estimated standard errors of the prediction loss for the models. 
- reps
- a data frame containing the estimated prediction errors for the models from all replications. This is only returned in case of more than one resampling replication. 
- splits
- an object giving the data splits used to estimate the prediction error of the models. 
- y
- the response. 
- yHat
- a list containing the predicted values for the models. Each list component is again a list containing the corresponding predicted values from all replications. 
- best
- an integer vector giving the indices of the models with the best prediction performance. 
- selectBest
- a character string specifying the criterion used for selecting the best model. 
- seFactor
- a numeric value giving the multiplication factor of the standard error used for the selection of the best model. 
Note
To ensure comparability, the prediction errors for all models are required to be computed from the same data splits.
Author(s)
Andreas Alfons
References
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition.
See Also
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare LS, MM and LTS regression
# perform cross-validation for an LS regression model
fitLm <- lm(Y ~ ., data = coleman)
cvLm <- perry(fitLm, splits = folds,
              cost = rtmspe, trim = 0.1)
# perform cross-validation for an MM regression model
fitLmrob <- lmrob(Y ~ ., data = coleman)
cvLmrob <- perry(fitLmrob, splits = folds,
                 cost = rtmspe, trim = 0.1)
# perform cross-validation for an LTS regression model
fitLts <- ltsReg(Y ~ ., data = coleman)
cvLts <- perry(fitLts, splits = folds,
               cost = rtmspe, trim = 0.1)
# compare cross-validation results
perrySelect(LS = cvLm, MM = cvLmrob, LTS = cvLts)
Data splits for resampling-based prediction error measures
Description
Split observations or groups of observations into segments to be used
for (repeated) K-fold cross-validation, (repeated) random splitting
(also known as random subsampling or Monte Carlo cross-validation), or the
bootstrap.
Usage
perrySplits(n, control)
Arguments
| n | an integer giving the number of observations to be split. | 
| control | a control object of class  | 
Value
For the foldControl method, an object of class "cvFolds"
giving folds for (repeated) K-fold cross-validation (see
cvFolds).
For the splitControl method, an object of class "randomSplits"
giving random data splits (see randomSplits).
For the bootControl method, an object of class "bootSamples"
giving bootstrap samples (see bootSamples).
Note
Users may prefer the wrapper functions cvFolds,
randomSplits and bootSamples.
Author(s)
Andreas Alfons
See Also
foldControl, splitControl,
bootControl, cvFolds,
randomSplits, bootSamples
Examples
set.seed(1234)  # set seed for reproducibility
## data folds for K-fold cross-validation
perrySplits(20, foldControl(K = 5))
perrySplits(20, foldControl(K = 5, R = 10))
## random data splits
perrySplits(20, splitControl(m = 5))
perrySplits(20, splitControl(m = 5, R = 10))
## bootstrap samples
perrySplits(20, bootControl())
perrySplits(20, bootControl(R = 10))
Resampling-based prediction error for tuning parameter selection
Description
Select tuning parameters of a model by estimating the respective prediction
errors via (repeated) K-fold cross-validation, (repeated) random
splitting (also known as random subsampling or Monte Carlo
cross-validation), or the bootstrap.  It is thereby possible to supply a
model fitting function or an unevaluated function call to a model fitting
function.
Usage
perryTuning(object, ...)
## S3 method for class ''function''
perryTuning(
  object,
  formula,
  data = NULL,
  x = NULL,
  y,
  tuning = list(),
  args = list(),
  splits = foldControl(),
  predictFun = predict,
  predictArgs = list(),
  cost = rmspe,
  costArgs = list(),
  selectBest = c("min", "hastie"),
  seFactor = 1,
  final = FALSE,
  names = NULL,
  envir = parent.frame(),
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)
## S3 method for class 'call'
perryTuning(
  object,
  data = NULL,
  x = NULL,
  y,
  tuning = list(),
  splits = foldControl(),
  predictFun = predict,
  predictArgs = list(),
  cost = rmspe,
  costArgs = list(),
  selectBest = c("min", "hastie"),
  seFactor = 1,
  final = FALSE,
  names = NULL,
  envir = parent.frame(),
  ncores = 1,
  cl = NULL,
  seed = NULL,
  ...
)
Arguments
| object | a function or an unevaluated function call for fitting
a model (see  | 
| ... | additional arguments to be passed down. | 
| formula | a  | 
| data | a data frame containing the variables required for fitting the
models.  This is typically used if the model in the function call is
described by a  | 
| x | a numeric matrix containing the predictor variables. This is typically used if the function call for fitting the models requires the predictor matrix and the response to be supplied as separate arguments. | 
| y | a numeric vector or matrix containing the response. | 
| tuning | a list of arguments giving the tuning parameter values to be evaluated. The names of the list components should thereby correspond to the argument names of the tuning parameters. For each tuning parameter, a vector of values can be supplied. The prediction error is then estimated for all possible combinations of tuning parameter values. | 
| args | a list of additional arguments to be passed to the model fitting function. | 
| splits | an object of class  | 
| predictFun | a function to compute predictions for the test data.  It
should expect the fitted model to be passed as the first argument and the test
data as the second argument, and must return either a vector or a matrix
containing the predicted values.  The default is to use the
 | 
| predictArgs | a list of additional arguments to be passed to
 | 
| cost | a cost function measuring prediction loss.  It should expect
the observed values of the response to be passed as the first argument and
the predicted values as the second argument, and must return either a
non-negative scalar value, or a list with the first component containing
the prediction error and the second component containing the standard
error.  The default is to use the root mean squared prediction error
(see  | 
| costArgs | a list of additional arguments to be passed to the
prediction loss function  | 
| selectBest | a character string specifying a criterion for selecting
the best model.  Possible values are  | 
| seFactor | a numeric value giving a multiplication factor of the
standard error for the selection of the best model.  This is ignored if
 | 
| final | a logical indicating whether to fit the final model with the optimal combination of tuning parameters. | 
| names | an optional character vector giving names for the arguments containing the data to be used in the function call (see “Details”). | 
| envir | the  | 
| ncores | a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization).  If
this is set to  | 
| cl | a parallel cluster for parallel computing as generated by
 | 
| seed | optional initial seed for the random number generator (see
 | 
Details
(Repeated) K-fold cross-validation is performed in the following
way.  The data are first split into K previously obtained blocks of
approximately equal size (given by folds).  Each of the K data
blocks is left out once to fit the model, and predictions are computed for
the observations in the left-out block with predictFun.  Thus a
prediction is obtained for each observation.  The response and the obtained
predictions for all observations are then passed to the prediction loss
function cost to estimate the prediction error.  For repeated
K-fold cross-validation (as indicated by splits), this process
is replicated and the estimated prediction errors from all replications are
returned.
(Repeated) random splitting is performed similarly.  In each replication,
the data are split into a training set and a test set at random.  Then the
training data are used to fit the model, and predictions are computed for
the test data.  Hence only the response values from the test data and the
corresponding predictions are passed to the prediction loss function
cost.
For the bootstrap estimator, each bootstrap sample is used as training data
to fit the model.  The out-of-bag estimator uses the observations that do
not enter the bootstrap sample as test data and computes the prediction loss
function cost for those out-of-bag observations.  The 0.632 estimator
is computed as a linear combination of the out-of-bag estimator and the
prediction loss of the fitted values of the model computed from the full
sample.
In any case, if the response is a vector but predictFun returns a
matrix, the prediction error is computed for each column.  A typical use
case for this behavior would be if predictFun returns predictions
from an initial model fit and stepwise improvements thereof.
If formula or data are supplied, all variables required for
fitting the models are added as one argument to the function call, which is
the typical behavior of model fitting functions with a
formula interface.  In this case, the accepted values
for names depend on the method.  For the function method, a
character vector of length two should supplied, with the first element
specifying the argument name for the formula and the second element
specifying the argument name for the data (the default is to use
c("formula", "data")).  Note that names for both arguments should be
supplied even if only one is actually used.  For the call method,
which does not have a formula argument, a character string specifying
the argument name for the data should be supplied (the default is to use
"data").
If x is supplied, on the other hand, the predictor matrix and the
response are added as separate arguments to the function call.  In this
case, names should be a character vector of length two, with the
first element specifying the argument name for the predictor matrix and the
second element specifying the argument name for the response (the default is
to use c("x", "y")).  It should be noted that the formula or
data arguments take precedence over x.
Value
If tuning is an empty list, perryFit is called to
return an object of class "perry".
Otherwise an object of class "perryTuning" (which inherits from class
"perrySelect") with the following components is returned:
- pe
- a data frame containing the estimated prediction errors for all combinations of tuning parameter values. In case of more than one replication, those are average values over all replications. 
- se
- a data frame containing the estimated standard errors of the prediction loss for all combinations of tuning parameter values. 
- reps
- a data frame containing the estimated prediction errors from all replications for all combinations of tuning parameter values. This is only returned in case of more than one replication. 
- splits
- an object giving the data splits used to estimate the prediction error. 
- y
- the response. 
- yHat
- a list containing the predicted values for all combinations of tuning parameter values. Each list component is again a list containing the corresponding predicted values from all replications. 
- best
- an integer vector giving the indices of the optimal combinations of tuning parameters. 
- selectBest
- a character string specifying the criterion used for selecting the best model. 
- seFactor
- a numeric value giving the multiplication factor of the standard error used for the selection of the best model. 
- tuning
- a data frame containing the grid of tuning parameter values for which the prediction error was estimated. 
- finalModel
- the final model fit with the optimal combination of tuning parameters. This is only returned if argument - finalis- TRUE.
- call
- the matched function call. 
Note
The same data splits are used for all combinations of tuning parameter values for maximum comparability.
If a final model with the optimal combination of tuning parameters is
computed, class "perryTuning" inherits the coef(),
fitted(), predict() and residuals() methods from
its component finalModel.
Author(s)
Andreas Alfons
References
Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd edition.
See Also
perryFit, perrySelect,
cvFolds, randomSplits,
bootSamples, cost
Examples
library("perryExamples")
data("coleman")
## evaluate MM regression models tuned for 85% and 95% efficiency
tuning <- list(tuning.psi = c(3.443689, 4.685061))
## via model fitting function
# perform cross-validation
# note that the response is extracted from 'data' in
# this example and does not have to be supplied
perryTuning(lmrob, formula = Y ~ ., data = coleman,
            tuning = tuning, splits = foldControl(K = 5, R = 10),
            cost = rtmspe, costArgs = list(trim = 0.1), seed = 1234)
## via function call
# set up function call
call <- call("lmrob", formula = Y ~ .)
# perform cross-validation
perryTuning(call, data = coleman, y = coleman$Y,
            tuning = tuning, splits = foldControl(K = 5, R = 10),
            cost = rtmspe, costArgs = list(trim = 0.1), seed = 1234)
Random data splits
Description
Split observations or groups of observations into training and test data to be used for (repeated) random splitting (also known as random subsampling or Monte Carlo cross-validation).
Usage
randomSplits(n, m, R = 1, grouping = NULL)
Arguments
| n | an integer giving the number of observations to be split into
training and test data.  This is ignored if  | 
| m | an integer giving the number of observations or groups of observations to be used as test data. | 
| R | an integer giving the number of random data splits. | 
| grouping | a factor specifying groups of observations. If supplied, the data are split according to the groups rather than individual observations such that all observations within a group belong either to the training or test data. | 
Value
An object of class "randomSplits" with the following
components:
- n
- an integer giving the number of observations or groups. 
- m
- an integer giving the number of observations or groups in the test data. 
- R
- an integer giving the number of random data splits. 
- subsets
- an integer matrix in which each column contains the indices of the observations or groups in the test data of the corresponding random data split. 
- grouping
- a list giving the indices of the observations belonging to each group. This is only returned if a grouping factor has been supplied. 
Note
This is a simple wrapper function for perrySplits with a
control object generated by splitControl.
Author(s)
Andreas Alfons
See Also
perrySplits, splitControl,
cvFolds, bootSamples
Examples
set.seed(1234)  # set seed for reproducibility
randomSplits(20, m = 5)
randomSplits(20, m = 5, R = 10)
Recompute resampling-based prediction error measures
Description
Recompute prediction error measures for previously obtained objects that contain resampling-based prediction error results. This is useful for computing a different measure of prediction loss.
Usage
reperry(object, ...)
## S3 method for class 'perry'
reperry(object, cost = rmspe, ...)
## S3 method for class 'perrySelect'
reperry(object, cost = rmspe, ...)
Arguments
| object | an object inheriting from class  | 
| ... | for the generic function, additional arguments to be passed 
down to methods.  For the methods,  additional arguments to be passed to the 
prediction loss function  | 
| cost | a cost function measuring prediction loss.  It should expect 
the observed values of the response to be passed as the first argument and 
the predicted values as the second argument, and must return either a 
non-negative scalar value, or a list with the first component containing 
the prediction error and the second component containing the standard 
error.  The default is to use the root mean squared prediction error 
(see  | 
Value
An object similar to object containing the results for the 
new measure of prediction loss.
Author(s)
Andreas Alfons
See Also
perryFit, perryTuning, 
perrySelect
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare raw and reweighted LTS estimators for 50% and 75%
## subsets based on their RTMSPE with 25% trimming
# 50% subsets
fit50 <- ltsReg(Y ~ ., data = coleman, alpha = 0.5)
cv50 <- perry(fit50, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.25)
# 75% subsets
fit75 <- ltsReg(Y ~ ., data = coleman, alpha = 0.75)
cv75 <- perry(fit75, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.25)
# combine results into one object
cv <- perrySelect("0.5" = cv50, "0.75" = cv75)
cv
## recompute the RTMSPE with 10% trimming
reperry(cv50, cost = rtmspe, trim = 0.1)
reperry(cv, cost = rtmspe, trim = 0.1)
Set up a plot of resampling-based prediction error results
Description
Extract and prepare the relevant information for a plot of results of resampling-based prediction error measures.
Usage
setupPerryPlot(object, ...)
## S3 method for class 'perry'
setupPerryPlot(
  object,
  which = c("box", "density", "dot"),
  select = NULL,
  seFactor = NA,
  ...
)
## S3 method for class 'perrySelect'
setupPerryPlot(
  object,
  which = c("box", "density", "dot", "line"),
  subset = NULL,
  select = NULL,
  seFactor = object$seFactor,
  ...
)
## S3 method for class 'perryTuning'
setupPerryPlot(object, ...)
Arguments
| object | an object inheriting from class  | 
| ... | for the  | 
| which | a character string specifying the type of plot to
prepare.  Possible values are  | 
| select | a character, integer or logical vector indicating the columns of prediction error results to be prepared for plotting. | 
| seFactor | a numeric value giving the multiplication factor of the
standard error for displaying error bars in dot plots or line plots.  Error
bars in those plots can be suppressed by setting this to  | 
| subset | a character, integer or logical vector indicating the subset of models to be prepared for plotting. | 
Value
An object of class "setupPerryPlot" with the following
components:
- data
- a data frame containing the following columns: - Fit
- a vector or factor containing the identifiers of the models. 
- Name
- a factor containing the names of the predictor error results (not returned in case of only one column of prediction error results with the default name). 
- PE
- the estimated prediction errors. 
- Lower
- the lower end points of the error bars (only returned if possible to compute). 
- Upper
- the upper end points of the error bars (only returned if possible to compute). 
 
- which
- a character string specifying the type of plot. 
- grouped
- a logical indicating whether density plots should be grouped due to multiple model fits (only returned in case of density plots for the - "perrySelect"and- "perryTuning"methods).
- includeSE
- a logical indicating whether error bars based on standard errors are available (only returned in case of dot plots or line plots). 
- mapping
- default aesthetic mapping for the plots. 
- facets
- default faceting formula for the plots (not returned in case of only one column of prediction error results with the default name). 
- tuning
- a data frame containing the grid of tuning parameter values for which the prediction error was estimated (only returned for the - "perryTuning"method).
Note
Duplicate indices in subset or select are removed such
that all models and prediction error results are unique.
Author(s)
Andreas Alfons
See Also
perryFit, perrySelect,
perryTuning,
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare LS, MM and LTS regression
# perform cross-validation for an LS regression model
fitLm <- lm(Y ~ ., data = coleman)
cvLm <- perry(fitLm, splits = folds,
              cost = rtmspe, trim = 0.1)
# perform cross-validation for an MM regression model
fitLmrob <- lmrob(Y ~ ., data = coleman, maxit.scale = 500)
cvLmrob <- perry(fitLmrob, splits = folds,
                 cost = rtmspe, trim = 0.1)
# perform cross-validation for an LTS regression model
fitLts <- ltsReg(Y ~ ., data = coleman)
cvLts <- perry(fitLts, splits = folds,
               cost = rtmspe, trim = 0.1)
# combine results into one object
cv <- perrySelect(LS = cvLm, MM = cvLmrob, LTS = cvLts,
                  .selectBest = "min")
cv
## convert MM regression results to data frame for plotting
# all replications for box plot
cvLmrobBox <- setupPerryPlot(cvLmrob, which = "box")
perryPlot(cvLmrobBox)
# aggregated results for dot plot
cvLmrobDot <- setupPerryPlot(cvLmrob, which = "dot", seFactor = 2)
perryPlot(cvLmrobDot)
## convert combined results to data frame for plotting
# all replications for box plot
cvBox <- setupPerryPlot(cv, which = "box")
perryPlot(cvBox)
# aggregated results for dot plot
cvDot <- setupPerryPlot(cv, which = "dot", seFactor = 2)
perryPlot(cvDot)
Control object for random data splits
Description
Generate an object that controls how to split n observations or
groups of observations into training and test data to be used for (repeated)
random splitting (also known as random subsampling or Monte Carlo
cross-validation).
Usage
splitControl(m, R = 1, grouping = NULL)
Arguments
| m | an integer giving the number of observations or groups of observations to be used as test data. | 
| R | an integer giving the number of random data splits. | 
| grouping | a factor specifying groups of observations. | 
Value
An object of class "splitControl" with the following
components:
- m
- an integer giving the number of observations or groups of observations to be used as test data. 
- R
- an integer giving the number of random data splits. 
- grouping
- if supplied, a factor specifying groups of observations. The data will then be split according to the groups rather than individual observations such that all observations within a group belong either to the training or test data. 
Author(s)
Andreas Alfons
See Also
perrySplits, randomSplits,
foldControl, bootControl
Examples
set.seed(1234)  # set seed for reproducibility
perrySplits(20, splitControl(m = 5))
perrySplits(20, splitControl(m = 5, R = 10))
Subsetting resampling-based prediction error results
Description
Extract subsets of resampling-based prediction error results.
Usage
## S3 method for class 'perry'
subset(x, select = NULL, ...)
## S3 method for class 'perrySelect'
subset(x, subset = NULL, select = NULL, ...)
Arguments
| x | an object inheriting from class  | 
| select | a character, integer or logical vector indicating the prediction error results to be extracted. | 
| ... | currently ignored. | 
| subset | a character, integer or logical vector indicating the subset of models for which to keep the prediction error results. | 
Value
An object similar to x containing just the selected results.
Note
Duplicate indices in subset or select are removed such 
that all models and prediction error results are unique.
Author(s)
Andreas Alfons
See Also
perryFit, perrySelect, 
perryTuning, subset
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare raw and reweighted LTS estimators for
## 50% and 75% subsets
# 50% subsets
fit50 <- ltsReg(Y ~ ., data = coleman, alpha = 0.5)
cv50 <- perry(fit50, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# 75% subsets
fit75 <- ltsReg(Y ~ ., data = coleman, alpha = 0.75)
cv75 <- perry(fit75, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# combine results into one object
cv <- perrySelect("0.5" = cv50, "0.75" = cv75)
cv
# extract reweighted LTS results with 50% subsets
subset(cv50, select = "reweighted")
subset(cv, subset = c(TRUE, FALSE), select = "reweighted")
Summarize resampling-based prediction error results
Description
Produce a summary of resampling-based prediction error results.
Usage
## S3 method for class 'perry'
summary(object, ...)
## S3 method for class 'perrySelect'
summary(object, ...)
## S3 method for class 'perryTuning'
summary(object, ...)
Arguments
| object | an object inheriting from class  | 
| ... | currently ignored. | 
Value
An object of class "summary.perry", "summary.perrySelect" or 
"summary.perryTuning", depending on the class of object.
Author(s)
Andreas Alfons
See Also
perryFit, perrySelect, 
perryTuning, summary
Examples
library("perryExamples")
data("coleman")
set.seed(1234)  # set seed for reproducibility
## set up folds for cross-validation
folds <- cvFolds(nrow(coleman), K = 5, R = 10)
## compare raw and reweighted LTS estimators for
## 50% and 75% subsets
# 50% subsets
fit50 <- ltsReg(Y ~ ., data = coleman, alpha = 0.5)
cv50 <- perry(fit50, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# 75% subsets
fit75 <- ltsReg(Y ~ ., data = coleman, alpha = 0.75)
cv75 <- perry(fit75, splits = folds, fit = "both",
              cost = rtmspe, trim = 0.1)
# combine results into one object
cv <- perrySelect("0.5" = cv50, "0.75" = cv75)
cv
# summary of the results with the 50% subsets
summary(cv50)
# summary of the combined results
summary(cv)