Type: | Package |
Title: | Time-Dependent ROC Curve Estimation for Correlated Right-Censored Survival Data |
Version: | 1.0.0 |
Description: | This contains functions that can be used to estimate a smoothed and a non-smoothed (empirical) time-dependent receiver operating characteristic curve and the corresponding area under the receiver operating characteristic curve for correlated right-censored time-to-event data. See Beyene and Chen (2024) <doi:10.1177/09622802231220496>. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | R(≥ 4.2) |
Imports: | frailtypack, stats, survival, xpectr |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.3 |
NeedsCompilation: | no |
Packaged: | 2025-10-08 20:35:38 UTC; m2kas |
Author: | Kassu Mehari Beyene
|
Maintainer: | Kassu Mehari Beyene <m2kassu@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-10-14 18:20:09 UTC |
frailtyROC
Description
frailtyROC is a tool for estimating and visualizing time-dependent receiver operating characteristic (ROC) curves, as well as the corresponding time-dependent area under the curve (AUC), in the context of correlated right-censored time-to-event data. Confidence bands for the ROC curve and confidence intervals for the AUC can be constructed using bootstrap-derived standard errors.
Details
frailtyROC is a comprehensive tool for estimating and visualizing time-dependent receiver operating characteristic (ROC) curves and their corresponding area under the curve (AUC) in the context of correlated right-censored time-to-event data. The ROC curves can be estimated either empirically (non-smoothed) or smoothed with or without boundary correction. For the latter case, the data-driven smoothing parameter selection methods introduced by Beyene and El Ghouch (2020) for smoothed ROC curves are implemented, offering an automatic approach to bandwidth selection during the smoothing process.
The package enables the estimation of time-dependent ROC curves at specific time points to evaluate the discriminatory ability of prognostic models or biomarkers. The time-dependent AUC serves as a summary measure of predictive accuracy over time. Confidence bands for the ROC curves and confidence intervals for AUC estimates are obtained via non-parametric bootstrap sampling based on the percentile method. This approach yields standard error estimates, which serve as the basis for bootstrap-based hypothesis testing of AUC at a given time point.
An essential component of the methodology involves estimating the survival function conditional on the observed data. This is accomplished through the use of shared frailty models specifically developed for correlated censored data. To this end, both semi-parametric and parametric frailty models are estimated using a penalized likelihood estimation framework that implemented in Rondeau et al. (2012). Users may specify either a gamma or log-normal frailty distribution.
Abbreviations
In this package, the following abbreviations are commonly used:
- ROC
Receiver Operating Characteristic curve.
- AUC
Area under the ROC curve at a given time horizon
t
.
Dataset
This package comes with a correlated right-censored marker data sets. For details see kidney
and LungCancer
Installing and using
Ensure that your system has an active internet connection, then execute the following command in the R console to install the package:
install.packages("frailtyROC")
To load the package after installation, use the following command:
library(frailtyROC)
Author(s)
Kassu Mehari Beyene and Ding-Geng Chen
Maintainer: Kassu Mehari Beyene <m2kassu@gmail.com>
References
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Rondeau, V., Marzroui, Y., & Gonzalez, J. R. (2012). frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. Journal of Statistical Software, 47, 1-28.
The cross-validation bandwidth selection for weighted data
Description
This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the CV method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) cross-validation bandwith selection method to the case of weighted data.
Usage
CV(X, wt, ktype = "normal")
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
Details
Bowman et al (1998) proposed the cross-validation bandwidth selection method for unweighted kernal smoothed distribution function. This method is implemented in the R
package kerdiest
.
We adapted this for the case of weighted data by incorporating the weight variable into the cross-validation function of Bowman's method. See Beyene and El Ghouch (2020) for details.
Value
Returns the computed value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene, Catholic University of Louvain. <kasu.beyene@uclouvain.be>
Anouar El Ghouch, Catholic University of Louvain. <anouar.elghouch@uclouvain.be>
References
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Bowman A., Hall P. and Trvan T.(1998). Bandwidth selection for the smoothing of distribution functions. Biometrika 85:799-808.
Quintela-del-Rio, A. and Estevez-Perez, G. (2015). kerdiest:
Nonparametric kernel estimation of the distribution function, bandwidth selection and estimation of related functions. R
package version 1.2.
Survival probability conditional to the observed data estimation for correlated right censored data.
Description
Survival probability conditional to the observed data estimation for correlated right censored data.
Usage
Csurv(
Y,
M,
censor,
group = NULL,
t = 0,
w,
knots = 10,
kappa = 10000,
method = "marg",
RandDist = "Gamma",
hazard = "Splines",
maxit = 300
)
Arguments
Y |
a numeric vector of event-times or observed times. |
M |
a numeric vector of (bio)marker or risk score values. |
censor |
a vector of censoring indicator, |
group |
a categorical vector of group/cluster. |
t |
a scalar time for prediction. The default value is |
w |
a scalar window for prediction. |
knots |
a scalar for specifying the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (knots+2) splines functions for the approximation of the hazard or the survival functions. Rondeau, et al. (2012) suggested that the number of knots must be between 4 and 20. The default is |
kappa |
a positive smoothing parameter value for the penalized likelihood estimation. The defaults is " |
method |
a character string specifying prediction method applied on model. The possible options are " |
RandDist |
a character string to state the distribution of random effect: " |
hazard |
types of hazard functions: " |
maxit |
maximum number of iterations. The default is |
Value
Return vector of estimated event status and its complement.
References
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Rondeau, V., Marzroui, Y., & Gonzalez, J. R. (2012). frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. Journal of Statistical Software, 47, 1-28.
NCCTG Lung Cancer Marker Data
Description
This dataset contains four columns: id
, the identifier for health institutions (clusters); marker
, the risk score; time
, the observed follow-up time; and status
, the event indicator for subjects in the NCCTG lung cancer marker dataset.
Usage
data(LungCancer)
Format
This is a data frame with 238 observations and the following 4 variables.
- id
health institutions code
- time
time to death in days
- status
censoring indicator; 1=censored, 2=dead
- marker
risk score derived from the observed data using frailty model
Details
The NCCTG lung cancer dataset was collected from 228 patients across 18 different healthcare institutions. The number of subjects per institution ranged from 2 to 36. For the final analysis, only 226 patients with complete records were included. The dataset contains survival times along with several important predictor variables, including: sex (coded as Male = 1, Female = 2), age (in years), ph.ecog (Eastern Cooperative Oncology Group performance status, assessed by a physician on a scale from 0 [asymptomatic] to 5 [dead]), and pat.karno (Karnofsky performance status, assessed by the patient).
The marker (risk score) was derived from three predictor variables: sex, age, and ph.ecog. To this end, a frailty model with gamma-distributed frailty was fitted. As in the previous example, the prognostic marker is defined as:
\hat{\nu} \exp(\hat{\beta}_1 \cdot \text{sex} + \hat{\beta}_2 \cdot \text{age} + \hat{\beta}_3 \cdot \text{ph.ecog})
where \hat{\nu}
is the estimated frailty term, and \hat{\beta}_i
(for i=
1,2,3) are
the estimated regression coefficients from the frailty model.
References
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
The normal reference bandwidth selection for weighted data
Description
This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the NR method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) normal reference bandwith selection method to the case of weighted data.
Usage
NR(X, wt, ktype = "normal")
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
Details
See Beyene and El Ghouch (2020) for details.
Value
Returns the computed value for the bandwidth parameter.
Author(s)
Kassu Mehari Beyene, Catholic University of Louvain. <kasu.beyene@uclouvain.be>
Anouar El Ghouch, Catholic University of Louvain. <anouar.elghouch@uclouvain.be>
References
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
The plug-in bandwidth selection for weighted data
Description
This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the PI method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) direct plug-in bandwith selection method to the case of weighted data.
Usage
PI(X, wt, ktype = "normal")
Arguments
X |
The numeric vector of random variable. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
Details
See Beyene and El Ghouch (2020) for details.
Value
Returns the computed value for the bandwith parameter.
Author(s)
Kassu Mehari Beyene, Catholic University of Louvain. <kasu.beyene@uclouvain.be>
Anouar El Ghouch, Catholic University of Louvain. <anouar.elghouch@uclouvain.be>
References
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
ROC estimation function
Description
ROC estimation function
Usage
RocFun(U, D, M, bw = "NR", method, ktype)
Arguments
U |
The vector of grid points where the ROC curve is estimated. |
D |
The event indicator. |
M |
The numeric vector of marker values for which the time-dependent ROC curves is computed. |
bw |
The bandwidth parameter for smoothing the ROC function. The possible options are |
method |
is the method of ROC curve estimation. The possible options are |
ktype |
A character string giving the type kernel to be used: " |
Derivative of normal distribution
Description
Derivative of normal distribution
Usage
dnorkernel(ord, X)
Arguments
ord |
The order of derivative. |
X |
The numeric data vector. |
Estimation of the time-dependent ROC curve for correlated right-censored survival data
Description
This function computes a time-dependent ROC curve for correlated right censored survival data using the cumulative sensitivity and dynamic specificity definitions. The ROC curves can be either empirical (non-smoothed) or smoothed with/wtihout boundary correction. It also calculates the time-dependent area under the ROC curve (AUC).
Usage
frailtyROC(
Y,
M,
censor,
group = NULL,
w,
t = 1e-04,
U = NULL,
bw = "NR",
len = 151,
method = "tra",
method1 = "marg",
ktype = "normal",
knots = 10,
kappa = 10000,
RandDist = "Gamma",
hazard = "Splines",
maxit = 300,
B = 0,
alpha = 0.05,
plot = "TRUE"
)
Arguments
Y |
a numeric vector of event-times or observed times. |
M |
a numeric vector of (bio)marker or risk score values. |
censor |
a vector of censoring indicator, |
group |
a categorical vector of group/cluster. |
w |
a scalar window for prediction. |
t |
a scalar time for prediction. The default value is |
U |
a vector of grid points where the ROC curve is estimated. The default is a sequence of |
bw |
a character string specifying the bandwidth estimation method for the ROC itself. The possible options are " |
len |
a scalar value specifying the length of vector |
method |
a character string specifying the method of ROC curve estimation. The possible options are " |
method1 |
a character string specifying prediction method applied on model. The possible options are " |
ktype |
a character string specifying the type kernel distribution to be used for smoothing the ROC curve: " |
knots |
a scalar for specifying the number of knots to use. Value required in the penalized likelihood estimation. It corresponds to the (knots+2) splines functions for the approximation of the hazard or the survival functions. Rondeau, et al. (2012) suggested that the number of knots must be between 4 and 20. The default is |
kappa |
a positive smoothing parameter value for the penalized likelihood estimation. The defaults is " |
RandDist |
a character string to state the distribution of random effect: " |
hazard |
types of hazard functions: " |
maxit |
maximum number of iterations. The default is |
B |
a number of bootstrap samples to be used for variance estimation. The default is |
alpha |
the significance level. The default is |
plot |
a logical parameter to see the ROC curve plot. The default is |
Details
This function takes correlated right-censored survival data and returns an empirical (non-smoothed) ROC estimate and the smoothed time-dependent ROC estimate with/without boundary correction and the corresponding time-dependent AUC estimates.
For the smoothing parameter estimation, three data-driven methods: the normal reference "NR
", the plug-in "PI
" and the cross-validation "CV
" introduced in Beyene and El Ghouch (2020) were implemented. See Beyene and El Ghouch (2020) for details.
The conditional survival function estimation can done by using semi-parametric or fully parametric shared frailty models.
Value
Returns the following items:
ROC |
vector of estimated ROC values. These will be numeric values between zero and one. |
AUC |
data frame of dimension |
U |
vector of grid points used. |
bw |
computed value of bandwidth parameter. For the empirical method this is always |
Dt |
vector of estimated event status. |
M |
vector of (bio)marker vlaues. |
References
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Rondeau, V., Marzroui, Y., & Gonzalez, J. R. (2012). frailtypack: an R package for the analysis of correlated survival data with frailty models using penalized likelihood estimation or parametrical estimation. Journal of Statistical Software, 47, 1-28.
Examples
library(frailtyROC)
data(kidney)
out1 <- frailtyROC(Y=kidney$time, M=kidney$Marker, censor=kidney$status, group = kidney$id, w=120,
method = "emp", method1 = "cond", hazard = "Weibull")
out1$AUC
Numerical Integral function using Simpson's rule
Description
Numerical Integral function using Simpson's rule
Usage
integ(x, fx, method, n.pts = 256)
Arguments
x |
The numeric data vector. |
fx |
The function. |
method |
The character string specifying method of numerical integration. The possible options are |
n.pts |
Number of points. |
Distribution function without the ith observation
Description
Distribution function without the ith observation
Usage
ker_dis_i(X, y, wt, ktype, bw)
Arguments
X |
The numeric data vector. |
y |
The vector where the kernel estimation is computed. |
wt |
The non-negative weight vector. |
ktype |
A character string giving the type kernel to be used: " |
bw |
A numeric bandwidth value. |
Value
Returns the estimated value for the bandwidth parameter.
Function to evaluate the matrix of data vector minus the grid points divided by the bandwidth value.
Description
Function to evaluate the matrix of data vector minus the grid points divided by the bandwidth value.
Usage
kfunc(ktype = "normal", difmat)
Arguments
ktype |
A character string giving the type kernel to be used: " |
difmat |
A numeric matrix of sample data (X) minus evaluation points (x0) divided by bandwidth value (bw). |
Value
Returns the matrix resulting from evaluating difmat
.
Kernel distribution function
Description
Kernel distribution function
Usage
kfunction(ktype, X)
Arguments
ktype |
A character string giving the type kernel to be used: " |
X |
A numeric vector of sample data. |
Value
Returns a vector resulting from evaluating X.
Kidney Marker Data
Description
This dataset contains four columns: id
, patient code; marker
, the risk score; time
, the observed follow-up time; and status
, the event indicator for subjects in the kidney marker dataset.
Usage
data(kidney)
Format
This is a data frame with 76 observations and the following 4 variables.
- id
patient code
- time
time to recurrence of infection
- status
censoring indicator; 0=censored, 1=recurrence
- marker
risk score derived from observed data using frailty model
Details
This dataset pertains to the recurrence of infection in patients with kidney disease who use portable dialysis equipment. Recurrent infection is the primary complication in these patients, typically occurring at the catheter insertion site. When an infection occurs, the catheter is removed and reinserted after successful treatment. In some cases, the catheter is removed for reasons unrelated to infection; these observations are treated as censored. A total of 38 patients were followed to assess the time to recurrence of infection. Each patient contributes exactly two observations.
The dataset includes the following covariates: sex
(1 = Male, 2 = Female), age
, and disease
(Disease type; a factor with four levels: "GN"
, "AN"
, "PKD"
, and "Other"
). The marker is derived using a gamma frailty model as follows:
\hat{\nu} \exp(\hat{\beta}_1 \, \mathrm{sex} + \hat{\beta}_2 \, \mathrm{age} + \hat{\beta}_3 \, \mathrm{GN} + \hat{\beta}_4 \, \mathrm{AN} + \hat{\beta}_5 \, \mathrm{PKD})
where \hat{\nu}
is the estimated frailty term, and \hat{\beta}_i
(for i=
1,2,3) are
the estimated regression coefficients from the frailty model.
References
Beyene, K. M., and Chen, D. G. (2024). Time-dependent receiver operating characteristic curve estimator for correlated right-censored time-to-event data. Statistical Methods in Medical Research, 33(1), 162-181.
The value of squared integral x^2 k(x) dx and integral x k(x) K(x) dx
Description
The value of squared integral x^2 k(x) dx and integral x k(x) K(x) dx
Usage
muro(ktype)
Arguments
ktype |
A character string giving the type kernel to be used: " |
Function for Weighted inter-quartile range estimation
Description
Function for Weighted inter-quartile range estimation
Usage
wIQR(X, wt)
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
Function to select the bandwidth parameter needed for smoothing the time-dependent ROC curve.
Description
This function computes a data-driven bandwidth for smoothing the ROC curve, supporting three methods: the normal reference method, the plug-in method, and the cross-validation method introduced in Beyene and El Ghouch (2020). It is particularly important for estimating the bandwidth in the presence of weighted data.
Usage
wbw(X, wt, bw = "NR", ktype = "normal")
Arguments
X |
numeric data vector. |
wt |
non-negative weight vector. |
bw |
a character string specifying the bandwidth selection method. The possible options are " |
ktype |
a character string indicating the type of kernel function: " |
Value
Returns the estimated value for the bandwidth parameter.
References
Beyene, K.M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373-3396.
Examples
library(frailtyROC)
X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector
# Normal reference bandwidth selection
wbw(X = X, wt = wt)$bw
Function for Weighted quartile estimation
Description
Function for Weighted quartile estimation
Usage
wquantile(X, wt, p = 0.5)
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
p |
The percentile value. The defult is 0.5. |
Function for Weighted variance estimation
Description
Function for Weighted variance estimation
Usage
wvar(X, wt, na.rm = FALSE)
Arguments
X |
The numeric data vector. |
wt |
The non-negative weight vector. |
na.rm |
The character indicator whether to consider missing value(s) or not. The default is FALSE. |