‘MRgrowth’ provides tools for estimating body-size growth rates from mark-recapture data. It is intended for situations where individuals do not have known ages, a common scenario in field studies of reptiles and other long-lived animals, and it has been designed to be as simple and easy to use as possible.
‘MRgrowth’ supports three growth models:
"vb"; using the
Faben’s reformulation)"gompertz")"logistic")This vignette walks through a complete workflow using a turtle mark-recapture dataset.
‘MRgrowth’ includes a built-in example data set MRdata,
which contains mark-recapture data for a turtle population. Each row
represents a single capture event for an individual.
library(MRgrowth)
data(MRdata)
head(MRdata)
#> id size date
#> 1 t1 124 2019-06-06
#> 2 t1 256 2025-06-13
#> 3 t10 108 2020-07-28
#> 4 t10 268 2025-06-14
#> 5 t11 139 2019-06-08
#> 6 t11 166 2020-07-24
str(MRdata)
#> 'data.frame': 43 obs. of 3 variables:
#> $ id : chr "t1" "t1" "t10" "t10" ...
#> $ size: int 124 256 108 268 139 166 NA 193 200 200 ...
#> $ date: Date, format: "2019-06-06" "2025-06-13" ...The data set contains three columns:
id — a unique identifier for each individualdate — the date of the capture event (in Date format;
POSIX formats with associated times are also supported)size — the body size of the individual at capture (the
example is in mm, but any units are acceptable)Some individuals were captured more than twice, and there are some
missing values. ‘MRgrowth’ handles both situations through the
format.data() function described below.
Raw mark-recapture data are typically in long format (one row per
capture event), but ‘MRgrowth’ requires a wide format (one row per pair
of capture events). The format.data() function handles this
conversion. Filtering for NAs and individuals that were only captured
once is handled internally and you do not need to filter the data
beforehand (see below).
formatted.data <- format.data(
x = MRdata,
id.col = "id",
date.col = "date",
size.col = "size",
units = "years",
retain = "max"
)
head(formatted.data)
#> id size1 size2 date1 date2 td
#> 1 t1 124 256 2019-06-06 2025-06-13 6.024657534
#> 2 t10 108 268 2020-07-28 2025-06-14 4.882191781
#> 3 t11 139 166 2019-06-08 2020-07-24 1.128767123
#> 4 t12 193 197 2021-06-14 2021-08-29 0.208219178
#> 5 t13 234 258 2020-07-29 2021-02-23 0.572602740
#> 6 t14 412 413 2022-06-24 2022-06-27 0.008219178The output contains one row per pair of capture events, with columns for:
id — individual identifiersize1 — size at first capturesize2 — size at recapturedate1 / date2 — dates of each capturetd — time elapsed between captures in the units you
specifiedOnly the size1, size2, and td columns are needed for subsequent functions. The date1 and date2 columns are provided in case users need to match this output with other aspects of their data.
Note: Sizes can be in any units you like, and those
units will be carried throughout all subsequent functions. Thus, if you
enter your data in mm, all results are for mm. Likewise, whatever time
unit you set in format.data() will be the time unit for all
subsequent functions and results.
The retain argument controls what happens when an
individual was captured more than twice. As a general rule, individuals
should only be included once to avoid pseudoreplication unless you are
running a model with an error structure that handles replication per
individual (those models are not currently supported in ‘MRgrowth’)
"max" (default) — uses the first and last capture,
maximizing the time interval and therefore the information available for
estimating growth. This is recommended in most situations."random" — randomly selects two captures. Use the
seed argument for reproducibility."all" — retains all consecutive pairs of captures
(e.g., captures 1&2, 2&3, 3&4). Note that this introduces
pseudoreplication because measurements from the same
individual are not independent, and should be used with caution.Any capture events with missing data are removed before the
retain argument is applied. This means that if an
individual was captured three times but had missing size data at the
first capture, the function will use the second and third captures even
if retain = "max".
Additional, any individuals that were only measured once are filtered out of the data set.
The calc.growth() function is the primary function in
‘MRgrowth’. It fits one or more growth models to the formatted data,
returns the fitted values of asymptotic size (A) and growth rate (k),
calculates bootstrapped confidence intervals, and optionally produces a
plot of growth curves.
size0 — the body size at birth or
hatching. This does not affect the estimates of A or k, but it does
affect the predicted sizes at each age in age.vect. Use a
species-typical birth or hatchling size. If you do not provide a value,
the function will use half the smallest observed size, which may be
unrealistic.
age.vect — a vector of ages for which
predicted sizes and confidence intervals will be calculated. Choose a
range that covers the full lifespan of your species. If not provided,
the function defaults to ages 1 through 4 times the largest observed
time interval, which may not cover the full age range. The number of
values in this vector has very little impact on run time, so it is best
to use fine-scale intervals (e.g., every year) to produce smooth
curves.
models — a vector of models to run and
evaluate (any combination of “vb”,“logistic”,“gompertz”). The default is
to run and compare all three.
runs — 95% confidence intervals are
calculated by bootstrapping, and the runs argument determines the number
of bootstrap replicates (higher values = better estimates but longer run
times). The default of 5000 is generally appropriate for publication.
Lower values (e.g., 100) can be used during exploratory analysis to
reduce run time. This argument only affects the confidence intervals,
not the the actual estimates of A, k, or model diagnostics (e.g., (AIC).
Therefore, for large data sets, you may want to do an initial run with
bootstrapping essentially turned off by setting it to 1. Use that run to
determine which model is the best fit, then run a second analysis using
a high number of runs (e.g., 5000) and only the best fit model. Note
that the seed argument is passed internally to set.seed(),
so results are repeatable.
type — by default, ‘MRgrowth’ uses
maximum likelihood (“ML”) estimation, but it can be changed to least
squares (“OLS”). Usually, ML and OLS will produce identical results;
however, OLS will not return AIC values. Occasionally, ML may result in
convergence issues which are not present in OLS. Therefore, if an ML
model does not converge, you may want to try it with OLS.
fixed.A — in a limited number of cases,
you may want to evaluate only the k parameter and supply a fixed
asymptotic value (A) based on existing knowledge (e.g., a different
study of a different population). As a general rule, this is not
advisable, but can be used in some cases such as small sample sizes.
Interpreting comparisons of models with and without a fixed A value is
complex. AICs are inherently penalized by the number of parameters
involved; thus, models with fixed.A have fewer parameters, which will
result in lower AICs unless the fit is substantially worse. As a result,
if you run models with and without a fixed A, and the fixed values are
close to the values that would have been estimated, you will get lower
AICs even if the other parameters (k) do not actually fit as well. This
is simply because you had one fewer parameter penalizing the AIC score.
Further, for those AICs to be valid, the value of A must have been
derived entirely independently (e.g., a different population). If it was
based on a previous run or some aspect of the population being studied
(e.g., max observed body size) then it is not independent and the AIC
score has not been accurately penalized for the number of parameters. To
help with this, two sets of AIC scores are returned when a value is
supplied for fixed.A. The “fixed” scores were calculated based on two
parameters being estimated (k and sigma). These are the “correct” AICs
for models with a fixed A, but they come with the caveats described
above. The “AIC” and “AICc” columns contain the AICs based on three
parameters being estimated (A, k, and sigma), and may be more
appropriate if the selection of A was in anyway biased. Other model
diagnostics like convergence issues and NLL should also be
consulted.
other arguments — The other inputs are
set to reasonable defaults and, in most cases, they do not need to be
altered. Indeed, in most cases, only the data set, size0, and age.vect
need to be supplied for a useful run (assuming your data is the output
of format.data()).
Adjusting initial.A and initial.k can result in slight improvements
in run time, but they are generally not noticeable. In concept,
initial.A and initial.k values could affect convergence in the ML
models, but to avoid this calc.growth() internally uses the
initial values in an OLS model to produce suitable inputs for an ML
model, and the final results are based on the ML model. So, in practice,
initial values have no effect on the final result.
Let’s run a simple comparison of all three models, looking at the expected size at ages 1-50 (years), with a hatchling size of 50 mm. For this example, we will keep runs low (100) to reduce run time, but for actual analyses, you want several thousand runs. Most values will be left on their default settings.
# Using low runs for vignette build speed
results <- calc.growth(
x = formatted.data,
size0 = 50,
age.vect = 1:50,
models = c("vb", "gompertz", "logistic"),
type = "ML",
runs = 100,
return.plot = TRUE
)calc.growth() returns a list with two elements.
results$parameters
#> model A A.lower A.upper k k.lower k.upper
#> 1 vb 403.7077 377.8327 455.7548 0.1365507 0.09945302 0.1816731
#> 2 gompertz 396.6738 375.5141 430.7656 0.2029850 0.15379676 0.2518096
#> 3 logistic 393.0270 373.9270 419.7161 0.2835795 0.22354195 0.3506872
#> convergence NLL AIC AICc
#> 1 0 73.97743 153.9549 155.6692
#> 2 0 74.37535 154.7507 156.4650
#> 3 0 75.65250 157.3050 159.0193Each row contains the fitted parameters and diagnostics for one model:
A — the estimated asymptotic size.
Note that A is an average asymptotic size for the population. Thus, you
will probably have some individuals larger than A.k — the growth rate coefficient.
Higher values indicate faster growth toward the asymptote.A.lower / A.upper and
k.lower / k.upper — the 95%
bootstrapped confidence intervals for each parameter.convergence — a code from
optim(). A value of 0 indicates the optimizer
converged successfully. Any other value indicates a convergence issue
and the results should be treated cautiously.NLL — the negative log-likelihood of
the fitted model (only returned if type = “ML” [default]).SS — the sum of squares of the fitted
model (only returned if type = “SS”).AIC /
AICc — information criteria for model
comparison (only returned if type = “ML” [default]; see the fixed.A
section in step 2).head(results$sizes.at.ages)
#> model age size upper lower
#> 1 vb 1 95.14652 104.5690 84.46875
#> 2 vb 2 134.53064 149.9991 115.61696
#> 3 vb 3 168.88785 188.3299 143.76454
#> 4 vb 4 198.85977 220.5695 169.38923
#> 5 vb 5 225.00614 248.1579 192.65128
#> 6 vb 6 247.81524 270.9825 213.75608This data frame contains the predicted size at each age in
age.vect for each model, along with the upper and lower
bootstrapped 95% confidence intervals. These are the data behind the
growth curve plot and are available for more refined plotting in ggplot2
or other graphing software.
When multiple models are fitted, the AIC and AICc values in
results$parameters can be used to compare them. Lower
values indicate a better fit relative to model complexity.
NLL or SS values can also be useful (lower values = better fits).
Make sure to check the convergence column. Values other than 0 indicate convergence issues and generally should not be trusted even if they have low AICs.
In this case, no models had convergence issues and the ‘vb’ (von Bertalanffy) appears to be the best fit, so we will use it in all subsequent steps.
#look at model diagnostics
print(results$parameters)
#> model A A.lower A.upper k k.lower k.upper
#> 1 vb 403.7077 377.8327 455.7548 0.1365507 0.09945302 0.1816731
#> 2 gompertz 396.6738 375.5141 430.7656 0.2029850 0.15379676 0.2518096
#> 3 logistic 393.0270 373.9270 419.7161 0.2835795 0.22354195 0.3506872
#> convergence NLL AIC AICc
#> 1 0 73.97743 153.9549 155.6692
#> 2 0 74.37535 154.7507 156.4650
#> 3 0 75.65250 157.3050 159.0193
# Extract parameters from the best-fitting model
best.model <- results$parameters[which(results$parameters$model=="vb"), ]Additional checks can be performed by using the resulting model
parameters and the age.at.size() and
size.at.age() functions (see Step 5)
Once you have determined the best fit model, it is a good idea to interrogate it further to see how well it actually fits the data. Simply being the “best” model out of the three we tried does not mean that the model is actually a good fit (it may simply be not as bad as the others). This section first walks introduces some aditional functions, then explains how to use them to interrogate your model.
Once you have estimates of A and k, you can use
size.at.age() to predict the size of an individual at any
age.
#calculate expected sizes for a 10, 20 and 30 year old turtle
#note that the model argument specifies that we are using a vb model
size.at.age(
A = best.model$A,
k = best.model$k,
age = c(10, 20, 30),
size0 = 50,
model = "vb")
#> [1] 313.4234 380.6626 397.8254This will give the same results as the $sizes.at.ages output from
calc.growth(), but you can use it to examine different
sizes, different birth/hatching sizes, or slight variations in A and
k.
Once you have estimates of A and k, you can use
age.at.size() to estimate the age of an individual based on
its size.
#Estimate the age (years) of a 150, 250, and 350 mm turtle
#note that the model argument specifies that we are using a vb model
age.at.size(
A = best.model$A,
k = best.model$k,
size = c(150, 250, 350),
size0 = 50,
model = "vb")
#> [1] 2.433440 6.103358 13.803764Note that ages cannot be calculated for sizes greater than A; these
will return NaN.
The three growth functions (vb(),
gompertz(), logistic()) can also be used
directly to predict the size of an individual after a given time
interval, starting from a known size. Note again that all units
(including time) should be the same as the initial units used in
format.data().
You can use these functions to perform further checks on your model.
We can use the size1 and time difference in our formatted input data to predict what size turtles should be at recapture. We can then compare those sizes to the size2 data to see how well the model predicted them.
predicted.size2 <- vb(
A = best.model$A,
k = best.model$k,
size1 = formatted.data$size1, #use size 1 data
td = formatted.data$td #use time difference in the data
)
plot(x = formatted.data$size2,
y = predicted.size2,
xlab = "Observed size2",
ylab = "Estimated size2",
pch = 16,
col = "blue",
bg = "white",
panel.first = rect(par("usr")[1], par("usr")[3],
par("usr")[2], par("usr")[4],
col = "white"))
abline(lm(predicted.size2 ~ formatted.data$size2),
col = "black",
lwd = 1)
r2 <- summary(lm(predicted.size2 ~ formatted.data$size2))$r.squared
legend("topleft",
legend = bquote(R^2 == .(round(r2, 4))),
bty = "n")Here you can see that the values match up pretty well with a high R2. So the model did a good job. If the R2 was low, it would indicate that even the “best” model is doing a poor job of fitting this data and should be treated with skepticism.
Another important check is the spacing of the residuals along the line. If, for example, there were large deviations only on the left (small side), it would suggest that the model is doing a good job of estimating growth of large individuals, but struggling with small individuals.
Note: Because of the models’ math, for
vb(), gompertz(), and logistic()
any time that size1 is < A, the estimated size will be less than size
1. Therefore, you may want to either filter those out (the best option
for R2 values) or replace them with size1 in the output.
#remove any individuals where size1 > A
observed.size2.b <- formatted.data$size2[which(formatted.data$size1 < best.model$A)]
predicted.size2.b <- predicted.size2[which(formatted.data$size1 < best.model$A)]
plot(x = observed.size2.b,
y = predicted.size2.b,
xlab = "Observed size2",
ylab = "Estimated size2",
pch = 16,
col = "blue",
bg = "white",
panel.first = rect(par("usr")[1], par("usr")[3],
par("usr")[2], par("usr")[4],
col = "white"))
abline(lm(predicted.size2.b ~ observed.size2.b),
col = "black",
lwd = 1)
r2 <- summary(lm(predicted.size2.b ~ observed.size2.b))$r.squared
legend("topleft",
legend = bquote(R^2 == .(round(r2, 4))),
bty = "n")If you have a subset of individuals with known ages, you can use the
size.at.age() and age.at.size() functions to
further check the models.
Let’s say, for example, that in addition to the animals used to generate the models, we have 4 individuals of know ages. They are 5, 10, 12, and 20 years old, and are 200, 320, 330, and 400 mm. Let’s start by using their ages to predict their sizes. We’ll convert the results into percent differences to make them easier to interpret.
predicted.size <- size.at.age(A=best.model$A,
k=best.model$k,
age=c(5,10,12,20),
size0=50,
model="vb")
#expected sizes given their ages
print(predicted.size)
#> [1] 225.0061 313.4234 335.0000 380.6626
#percent difference between observed and predicted sizes
print((c(200,320,330,400)-predicted.size)/c(200,320,330,400)*100)
#> [1] -12.503071 2.055181 -1.515141 4.834361Mostly, those size estimates line up pretty well. As a point of comparison, let’s do the same but with the logistic model, which did not fit as well.
logistic.model <- results$parameters[which(results$parameters$model=="logistic"), ]
predicted.size.logistic <- size.at.age(A=logistic.model$A,
k=logistic.model$k,
age=c(5,10,12,20),
size0=50,
model="logistic")
#predicted sizes based on the logistic model
print(predicted.size.logistic)
#> [1] 147.6557 280.2290 319.9807 383.9592
#percent difference between observed and predicted sizes for the logistic model
print((c(200,320,330,400)-predicted.size.logistic)/c(200,320,330,400)*100)
#> [1] 26.172125 12.428434 3.036141 4.010198These estimates are much further off, especially for smaller individuals, which makes sense, given that the logistic model did not fit as well.
Now we can invert things and look at the expected ages for those sizes.
predicted.age <- age.at.size(A=best.model$A,
k=best.model$k,
size=c(200,320,330,400),
size0=50,
model="vb")
print(predicted.age)
#> [1] 4.040877 10.553877 11.485572 33.379912Again, not too bad, suggesting that that the vb model is doing a good job.
Edmonds, D., et al. (2021). Growth and population structure of a long-lived reptile. PLOS ONE. doi:10.1371/journal.pone.0259978
von Bertalanffy, L. (1957). Quantitative laws in metabolism and growth. The Quarterly Review of Biology, 32, 217–231.
Fabens, A.J. (1965). Properties and fitting of the von Bertalanffy growth curve. Growth, 29, 265–289.
Gompertz, B. (1833). On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society of London, 123, 252–253.
Schoener, T.W. and Schoener, A. (1978). Estimating and interpreting body-size growth in some Anolis lizards. Copeia, 1978, 390–405.