Though RSTr is designed to stabilize low-population regions, there is
a limit to the amount of information the model can gather, and estimates
from exceedingly low-population regions may be over-smoothed. To address
these issues, we can establish criteria that indicate whether our
estimates are reliable enough to display. In this vignette, we will take
a deep dive into estimate reliability and showcase how we can use
reliability criteria to suppress our estimates with
suppress_estimates().
Let’s quickly demonstrate the importance of reliability metrics by comparing the traceplots of high- and low-population counties:
Here, we have two traceplots representing our sample values over time for a given county-group-year. On the left, we can see a traceplot for the highest-population county, and on the right is a traceplot for the lowest-population county. Each traceplot also includes a red line representing the mean event rate for that county-group-year. Note that in both traceplots, the mean line is nearby the plots, but either higher or lower than their general trend, indicating that the rates in both of these counties were attenuated thanks to spatial smoothing.
The traceplot on the left is exactly what we want to see: it is clearly fluctuating around a certain value. The right traceplot, however, is a bit less favorable: the value doesn’t seem to want to stabilize and jumps between values over the course of the model. However, the rate itself is naturally high, so the intensity of the fluctuation isn’t shocking. Smaller counties like the one shown here demonstrate the limits of RSTr: while the samples do hover around a single value for some iterations, the estimated values we would get for the county-group-year on the right will not be as reliable as estimates on the left due to the large variability of the samples gathered. Estimates like these are why we need standard measures of reliability.
Reliability can be easily tested in CAR models using two criteria:
Let’s get some reliability metrics for our dataset. The
mstcar() function automatically generates credible
intervals and relative precisions based on the perc_ci
argument:
mod_mst <- mstcar(name = "my_test_model", data = miheart, adjacency = miadj, seed = 1234, perc_ci = 0.95)# For computational reasons, full model fitting is not run during CRAN checks.
# When building on CRAN, this vignette loads a pre-fitted example model included with the package.
# The pkgdown website shows the full model-fitting workflow.
example_dir <- system.file("extdata", package = "RSTr")
mod_mst <- load_model("mstcar_example", example_dir)Here, we specify perc_ci = 0.95, which means our
relative precision estimates will be based on a 95% credible interval.
If we want to generate our suppressed estimates, we can simply input our
RSTr model object into the
suppress_estimates() function. Since we are using an MSTCAR
model, we have to specify a population threshold to suppress counties
with small population counts:
mod_mst <- suppress_estimates(mod_mst, threshold = 1e3)
mod_mst
#> RSTr object:
#>
#> Model name: mstcar_example
#> Model type: MSTCAR
#> Data likelihood: binomial
#> Estimate Credible Interval: 95%
#> Number of geographic units: 83
#> Number of samples: 3000
#> Estimates age-standardized: No
#> Estimates suppressed: Yes
#> Number of reliable rates: 3724 / 4980 (74.8%)
estimates <- get_estimates(mod_mst)
head(estimates)
#> county group year medians medians_suppressed ci_lower ci_upper rel_prec
#> 1 26001 35-44 1979 41.09875 NA 31.76009 47.11024 2.677417
#> 2 26003 35-44 1979 61.79864 61.79864 50.74198 80.73919 2.060146
#> 3 26005 35-44 1979 23.44843 23.44843 18.67969 28.12923 2.481436
#> 4 26007 35-44 1979 38.04293 38.04293 26.32669 48.99415 1.678306
#> 5 26009 35-44 1979 36.87313 36.87313 31.41068 44.70636 2.773316
#> 6 26011 35-44 1979 36.02715 36.02715 32.00151 47.83611 2.275216
#> events population
#> 1 1 964
#> 2 1 1011
#> 3 0 9110
#> 4 0 3650
#> 5 0 1763
#> 6 0 1470By default, suppress_estimates() will suppress based on
population counts, but by specifying type = "event", you
can alternatively use an event count threshold. This is helpful when
maintaining consistency with datasets that suppress by events instead of
population, such as CDC WONDER. Notice that the first two regions in the
35-44 age group have been suppressed; the first was suppressed by
population (964) and the second was suppressed by relative precision
(0.925). Now, we can map out our county estimates with suppression:
library(ggplot2)
est_3544 <- estimates$medians_suppressed[estimates$group == "35-44" & estimates$year == "1988"]
ggplot(mishp) +
geom_sf(aes(fill = est_3544)) +
labs(
title = "Smoothed Myocardial Infarction Death Rates in MI, Ages 35-44, 1988",
fill = "Deaths per 100,000"
) +
scale_fill_viridis_c() +
theme_void()#> NULL
These suppressed maps highlight an important benefit of age-standardizing: we can combine age groups to both bolster our relative precisions and to increase the total population in our groups, increasing values for both suppression criteria. Let’s age-standardize to 35-64 and see what counties are suppressed:
std_pop <- c(113154, 100640, 95799)
mod_mst <- age_standardize(mod_mst, std_pop, new_name = "35-64", groups = c("35-44", "45-54", "55-64"))
estimates <- get_estimates(mod_mst)The RSTr model object remembers that we’ve suppressed
our estimates and automatically performs suppression on our
age-standardized estimates. Let’s map our age-standardized rates:
est_3564 <- estimates$medians_suppressed[estimates$group == "35-64" & estimates$year == "1988"]
ggplot(mishp) +
geom_sf(aes(fill = est_3564)) +
labs(
title = "Smoothed Myocardial Infarction Death Rates in MI, Ages 35-64, 1988",
fill = "Deaths per 100,000"
) +
scale_fill_viridis_c() +
theme_void()#> NULL
As we increase our CI width from 0.50 to 0.95 to 0.99 and 0.995, our reliability criteria becomes more stringent and more counties become grayed out. It is important to find a good balance between credible interval choice and displaying of estimates; the traditionally-used credible interval of 95% provides a happy medium of these two factors.
In this vignette, we investigated measures of reliability and
observed how reliability measures change which data are suppressed. This
vignette concludes the main sections on using the functions in the RSTr
package. After reading these, you should be able to prepare your event
and adjacency data, configure your model as necessary, age-standardize
estimates, and determine which estimates are reliable. If you are
interested in more advanced features of the RSTr involving sample
processing, read vignette("RSTr-samples"); if you’d like
more information on defining custom inits and
priors, check vignette("RSTr-initialvalues")
and vignette("RSTr-priors"), respectively; if you are
interested in learning more about how the MSTCAR model itself works,
read vignette("RSTr-models").