Introduction
R-package VFP is based on the ideas of the standalone software Variance Function Program by William A. Sadler [https://www.aacb.asn.au/AACB/Resources/Variance-Function-Program]. This is also the reason why we chose to name it VFP instead of e.g. RPP for R Precision Profiles or similar.
Precision profiles are a comprehensive and intuitive way to visualize variability over a part of or the entire measuring range of a measuring method. Manufacturers of IVD-assays need to state the precision of their products in the method sheet (package insert). This is usually done in tabular format presenting total (im)precision and individual variance components at given concentrations of samples used in the respective precision experiment. The units of variability reported are either standard deviation (SD) and/ or coefficient of variation (CV) which is a percentage in regard to the sample concentration. The drawback of using tables is obvious. Variability of any concentration in between two reported samples and not close to one of these is not immediately clear.
Mean | Reproducibility %CV | Between-Site %CV | Between-Day %CV | Repeatability %CV | |
---|---|---|---|---|---|
sample.P1 | 12.08 | 8.6 | 5.1 | 3.5 | 6.0 |
sample.P2 | 41.58 | 4.4 | 3.1 | 0.8 | 3.1 |
sample.Q3 | 55.75 | 4.1 | 3.2 | 1.3 | 2.2 |
sample.Q4 | 165.70 | 3.8 | 3.3 | 0.8 | 1.7 |
sample.P5 | 379.10 | 2.4 | 1.3 | 0.5 | 2.0 |
sample.Q6 | 414.30 | 3.7 | 3.1 | 0.4 | 2.1 |
Additionally, any variability estimated is carrying more or less error in the same sense as data points in a regression analysis are not all located on the regression line. Precision profiles model the non-linear relationship between sample mean and variance taking into account the information provided by multiple precision-experiments. Below, an example is given, showing the precision profile on CV-scale for reproducibility variability of the CA19_9 data set from R-package VCA (2nd column in table above):
##
## Model 4 ... finished.
Each point in this plot corresponds to the reproducibility on CV[%] scale of one sample, i.e. there were six samples measured at three sites, on 5 days per site, and 5 replicated measurements per day. This data set corresponds to the numerical example of the CLSI EP05-A3 guideline showing a typical form of a precision profile with larger relative variability for lower concentrations and an almost constant %CV for larger concentrations. The precision profile for this data is based on model 4, i.e. for this non-linear function the Akaike Information Criterion (\(-2 \times logLik + 2 \times p\), p being the number of parameters) takes the lowest number. These are provided by R-package gnm which we chose for performing the actual model fitting.
How To Generate a Precision Profile
Any precision profile is based on the outcome of a precision experiment. The required input for fitting one of the models provided is Mean, Variance and the Degrees of Freedom for multiple samples used in the precision experiment. Let’s start with performing the analysis of such a precision experiment.
Now perform the variance component analysis (VCA) for each sample using batch-processing. Here, we the anovaVCA function from R-package VCA is used.
The result will be a list of VCA-objects.
## [1] "list"
## sample.P1 sample.P2 sample.Q3 sample.Q4 sample.P5 sample.Q6
## "VCA" "VCA" "VCA" "VCA" "VCA" "VCA"
##
##
## Result Variance Component Analysis:
## -----------------------------------
##
## Name DF SS MS VC %Total SD CV[%]
## 1 total 11.3181 1.0869 100 1.0425 8.6292
## 2 site 2 22.0419 11.0209 0.3843 35.3578 0.6199 5.1312
## 3 site:day 12 16.964 1.4137 0.1778 16.3565 0.4216 3.4899
## 4 error 60 31.488 0.5248 0.5248 48.2857 0.7244 5.9963
##
## Mean: 12.0813 (N = 75)
##
## Experimental Design: balanced | Method: ANOVA
Each VCA-object stores the information for one sample and there are multiple types of variability which can be used to construct a precision profile. Total variability corresponds to reproducibility in this example. One can also use repeatability (error) which is the pure assay imprecision. If one decides to use the sum of repeatability and day-to-day (aka between-day, here “site:day”) one uses intermediate precision or within-lab variability. The next code snippet shows how this information can be extracted from the list of VCA-objects.
Function get_mat does the job. Without selecting a specific element, i.e. variance component or sequence of these, total variability will be used by default.
## Mean DF VC SD CV
## sample.P1 12.08133 11.318142 1.086864 1.042528 8.629244
## sample.P2 41.58400 7.604586 3.376848 1.837620 4.419056
## sample.Q3 55.74667 4.896189 5.257296 2.292879 4.113034
## sample.Q4 165.65600 3.331477 39.752635 6.304969 3.806061
## sample.P5 379.09067 16.709246 85.059893 9.222792 2.432872
## sample.Q6 414.28667 4.112871 241.089499 15.527057 3.747902
Note:
This input does not need to be derived from a variance component
analysis performed with R-package VCA. One can can
easily compile the required input from externally performed analyses as
well.
To extract repeatability one can either specify “error” or the index of this variance component which will be 4 here.
## Mean DF VC SD CV
## sample.P1 12.08133 60 0.524800 0.7244308 5.996282
## sample.P2 41.58400 60 1.634800 1.2785930 3.074723
## sample.Q3 55.74667 60 1.559933 1.2489729 2.240444
## sample.Q4 165.65600 60 7.812800 2.7951386 1.687315
## sample.P5 379.09067 60 56.966933 7.5476442 1.990987
## sample.Q6 414.28667 60 73.959000 8.5999419 2.075843
In this example Intermediate Precision consists of two variance components, repeatability and day-to-day. To select these specify the sequence of indexes. Note, that this will only be possible for models that were fitted using ANOVA-estimation, thus, any model fitted using REML will result in an error.
## Mean DF VC SD CV
## sample.P1 12.08133 51.42153 0.7025733 0.8381965 6.937947
## sample.P2 41.58400 68.08386 1.7579600 1.3258808 3.188440
## sample.Q3 55.74667 51.60925 2.0831067 1.4432972 2.589029
## sample.Q4 165.65600 57.44560 9.6790933 3.1111241 1.878063
## sample.P5 379.09067 69.14653 60.1530533 7.7558400 2.045906
## sample.Q6 414.28667 69.89176 76.9797867 8.7738126 2.117812
The next step towards a precision profile consists of fitting a VFP-model. This can be done for a single model or in batch-mode. The latter means instead of fitting one model after another, one can select a set of models. There are 10 models implemented in R-package VFP:
- \(\sigma^2\), constant
variance
- \(\sigma^2=\beta_1 \times u^2\),
constant CV
- \(\sigma^2=\beta_1 + \beta_2 \times
u^2\), mixed constant, proportional variance
- \(\sigma^2=(\beta_1 + \beta_2 \times
u)^K\), constrained power model, constant exponent
- \(\sigma^2=\beta_1+\beta_2 \times
u^K\), alternative constrained power model
- \(\sigma^2=\beta_1 + \beta_2 \times u +
\beta_3 \times u^J\), unconstrained power model for variance
functions with a minimum
- \(\sigma^2=\beta_1 + \beta_2 \times
u^J\), alternative unconstrained power model
- \(\sigma^2=(\beta_1 + \beta_2 \times
u)^J\), unconstrained power model (default model of W.
Sadler)
- \(\sigma^2=\beta_1 \times u^J\),
similar to CLSI EP17 model
- \(CV=\beta_1 \times u^J\), exact CLSI EP17 model (fitted by linear regression on logarithmic scale)
Fitting all ten models is somehow redundant if constant K is chosen to be equal to 2, since models 3 and 5 are equivalent and these are constrained versions of model 7 where the exponent is also estimated. The latter also applies to model 4 which is a constrained version of model 8. Nevertheless, since computation time is not critical here for typical precision-profiles (of in-vitro diagnostics precision experiments) we chose to offer batch-processing as well.
During computation, all models are internally re-parameterized so as to guarantee that the variance function is positive in the range of u from 0 to \(u_{max}\). In models 7 and 8, J is restricted to \(0.1 < J < 10\) to avoid the appearance of sharp hooks. Occasionally, these restrictions may lead to a failure of convergence. This is then a sign that the model parameters are on the boundary and that the model fits the data very badly. This should not be taken as reason for concern. It occurs frequently for model 6 when the variance function has no minimum, which is normally the case.
If batch-processing has been performed, one can select one of the models or use the best fitting model automatically if none is specified (see below) in all subsequent function calls on the returned VFP-object.
Below model 1 is fitted to the total variability data.
##
## Model 1 ... finished.
The standard print-method for VCA-objects is identical to the summary-method yielding:
##
##
## (VFP) Variance-Function
## -----------------------
##
## Model 1: sigma^2=beta1
##
## Coefficients:
## beta1
## 54.39
##
## AIC = 232.6 RSS = 43870 Deviance = 64.31 GoF P-value= 0.04715
Now, all ten available models are fitted to the total variability data in batch mode. Note, model 5 is skipped since it is identical to model 3 without changing the default setting K=2 to something different.
##
## Model 1 ... finished.
## Model 2 ... finished.
## Model 3 ... finished.
## Model 4 ... finished.
## Model 5 ... skipped!
## Model 6 ... finished.
## Model 7 ... finished.
## Model 8 ... finished.
## Model 9 ... finished.
##
##
## +++ Summary Fitted VFP-Model(s) +++
## -------------------------------------
##
## Call:
## fit_vfp(Data = mat.total, model.no = 1:9)
##
## [+++ 8 Models Fitted +++]
##
## Model_1: sigma^2=beta1
## Model_2: sigma^2=beta1*u^2
## Model_3: sigma^2=beta1+beta2*u^2
## Model_4: sigma^2=(beta1+beta2*u)^2
## Model_6: sigma^2=beta1+beta2*u+beta3*u^J
## Model_7: sigma^2=beta1+beta2*u^J
## Model_8: sigma^2=(beta1+beta2*u)^J
## Model_9: sigma^2=beta1*u^J
##
## [+++ RSS (unweighted) +++]
##
## Model_1 Model_2 Model_3 Model_4 Model_6 Model_7 Model_8 Model_9
## 43870 136876 9334 11637 12833 12824 12677 15284
##
## [+++ AIC +++]
##
## Model_1 Model_2 Model_3 Model_4 Model_6 Model_7 Model_8 Model_9
## 232.6 201.9 149.6 140.6 143.1 141.1 141.6 149.5
##
## [+++ Deviance +++]
##
## Model_1 Model_2 Model_3 Model_4 Model_6 Model_7 Model_8 Model_9
## 64.310 22.790 2.712 1.875 1.757 1.758 1.799 2.699
##
## [+++ Best Model +++]
##
## Model 4: sigma^2=(beta1+beta2*u)^2
##
## Coefficients:
## beta1 beta2
## 0.73290 0.02671
##
## AIC = 140.6 RSS = 11637 Deviance = 1.875 GoF P-value= 1.000
##
## [+++ Note +++]
##
## Model 5 will not be fitted because it is identical to model 3 with 'K = 2'!
Plotting Precision Profiles
Our recommendation is to always fit all models and using the best one. Of course, if two fitted models are very similar in their AIC one can use the less complex model. The fitted model(s) is/are stored in a VFP-object. Calling the plot-method for these objects will always generate a precision profile on variance scale (VC), which is the scale model fitting takes place on.
One can plot the precision profile on variance (type=“vc”, default), SD (type=“sd”) or CV (type=“cv”) scale.
There are plenty of options to customize the visual appearance of a precision profile. One can add multiple precision profiles to one plot as done in the CLSI EP05-A3 guideline from which this example is borrowed. Below, we start with plotting the fitted precision profile for reproducibility, where concentrations on the X-axis are plotted on log-scale. Here, we also leave some space in the right margin to add a legend later on.
plot( tot.all, model.no = 8, mar = c(5.1, 5, 4.1, 8),
type = "cv", ci.type = "none", Model = FALSE,
Line = list(col = "blue", lwd = 3),
Points = list(pch = 15, col = "blue", cex = 1.5),
xlim = c(10, 450), ylim = c(0,10),
Xlabel = list(
text = "CA19-9, kU/L (LogScale) - 3 Patient Pools, 3 QC Materials",
cex = 1.25),
Title = NULL,
Ylabel = list(text = "% CV", cex = 1.25, line = 3),
Grid = NULL, Crit = NULL, log = "x")
# We now add the precision profile for intermediate precision
# to the existing plot (add=TRUE).
mod8.ip <- fit_vfp(mat.ip, 8)
##
## Model 8 ... finished.
plot(mod8.ip, type = "cv", add = TRUE, ci.type = "none",
Points = list(pch = 16, col = "deepskyblue", cex = 1.5),
Line = list(col = "deepskyblue", lwd = 3), log = "x")
# Now, add the precision profile of repeatability.
mod8.rep <- fit_vfp(mat.rep, 8)
##
## Model 8 ... finished.
plot(mod8.rep, type = "cv", add = TRUE, ci.type = "none",
Points = list(pch = 17, col = "darkorchid3", cex = 1.5),
Line = list(col = "darkorchid3", lwd = 3), log = "x")
# Finally, add a legend in the right margin.
legend_rm( x = "center", pch = 15:17, col = c("blue", "deepskyblue", "darkorchid3"),
cex = 1, legend = c("Reproducibility", "Within-Lab", "Repeatability"),
box.lty = 0)
We now conclude with an example showing many of the different features of the standard plot-method for VFP-objects.
plot(mod8.rep, BG = "darkgray",
Points = list(pch = 17, cex = 1.5, col = "blue"), Line = list(col = "blue"),
Grid = list(x = seq(0, 450, 50), y = seq(0, 110, 10), col = "white"),
Xlabel = list(cex = 1.5, text = "CA19-9 [U/mL]", col = "blue"),
Ylabel = list(cex = 1.5, text = "Repeatability on Variance-Scale", col = "blue"),
Crit = NULL)
Functional Sensitivity and More
Beyond beautification options available in the plot-method
for VFP-objects there is more useful functionality integrated.
Actually, the functionality of functions predict.VFP and
predict_mean are integrated via argument Prediction in
plot.VFP().
In the introduction, we talked about some drawbacks of presenting
precision data in tables. Usually, one is interested in the
(im)precision at a specific concentration, e.g. at a medical decision
point. This can easily be inferred from a precision profile and
visualized in one step using the Prediction argument. This can
be used in multiple ways as shown below.
If a number or a vector of numbers is assigned to argument
Prediction this will be interpreted as having specified
Prediction=list(y=x), where x is the number. This will
internally trigger calling function predict_mean to derive that
concentration at which the specified variability is reached. Note, that
the scale of the Y-axis is here taken into account, i.e. if on CV-scale
specifications mean percent CV which differs from SD- or
variance-scales.
The main use-case for this is determining functional
sensitivity of an assay, i.e. a concentrations at which e.g. a
pre-specified CV is not exceeded for all concentrations larger than this
value.
To add confidence intervals, one can set argument Pred.CI=TRUE.
Calling function predict.VFP within the plotting function can be achieved as shown below.
plot(mod8.rep, type = "cv", ylim = c(0, 8), xlim = c(0, 100),
Prediction = list(x = 50), Pred.CI = TRUE)
Once a precision profile for an IVD-assay is established one can infer variability at any concentration of interest from it. It nicely summarizes the precision performance of an assay in a simple plot providing much higher information density than e.g. a simple table. One can also use it to derive functional sensitivity, which is specific to an assay, being that concentration at which a pre-defined variability is undercut for all concentrations larger than this value.
C5 and C95 for Qualitative Tests
Another extremely useful property of precision profiles, predominantly required for qualitative tests, is to derive C5 and C95 concentrations. For qualitative tests, where an internal continuous response (ICR) is available from which the qualitative result is derived, e.g. negative or positive, non-reactive or. reactive, etc., precision profiles should be used to establish C5 and C95 according to the CLSI EP12 guideline. These concentrations correspond to samples which, when measured many times, are expected to yield proportions of measurements found having the condition of interest. In case of C5 this proportion is 5%, in case of C95 95% of the replicates are expected to have it. There is a function available in R-package VFP designed to provide this information based on a precision-profile stored as a VFP-object.
Assume values on the X-axis to be measurements of the ICR, which is usually available for manufacturers of qualitative IVD-assays.
## Mean SD Actual.p
## 38.22302704 1.08032254 0.04999995
The actual nice feature of this function is to visualize concentration Cx if requested via plot=TRUE.
## Mean SD Actual.p
## 38.22302704 1.08032254 0.04999995
The same can be done for C95.
## Mean SD Actual.p
## 41.8503153 1.1249122 0.9499999
Now assume there are two cutoffs defining an equivocal zone in between, i.e. a re-test zone, gray zone or borderline results. Using function precision_plot helps to nicely summarize all information linked to C5 and C95.
Note, that this function invisibly returns computed results, which might be of interest.
## List of 2
## $ cutoff1: num [1:41, 1:5] 0.005 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:5] "Cx" "Mean" "SD" "Actual.p" ...
## $ cutoff2: num [1:41, 1:4] 0.005 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:4] "Cx" "Mean" "SD" "Actual.p"
## NULL
## Cx Mean SD Actual.p Actual.p.rounded
## [1,] 0.005 32.39818 1.010092 0.00500001 0.5
## [2,] 0.010 32.64338 1.013014 0.01000002 1.0
## [3,] 0.020 32.91292 1.016229 0.02000003 2.0
## [4,] 0.030 33.08482 1.018282 0.03000004 3.0
## [5,] 0.040 33.21459 1.019832 0.04000005 4.0
## [6,] 0.050 33.32044 1.021098 0.05000005 5.0