For the most part, this document will present the functionalities of
the function surveysd::calc.stError()
which generates point
estimates and standard errors for user-supplied estimation
functions.
In order to use a dataset with calc.stError()
, several
weight columns have to be present. Each weight column corresponds to a
bootstrap sample. In the following examples, we will use the data from
demo.eusilc()
and attach the bootstrap weights using
draw.bootstrap()
and recalib()
. Please refer
to the documentation of those functions for more detail.
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(prettyNames = TRUE)
dat_boot <- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
strata = "region", period = "year")
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[, onePerson := nrow(.SD) == 1, by = .(year, hid)]
## print part of the dataset
dat_boot_calib[1:5, .(year, povertyRisk, eqIncome, onePerson, pWeight, w1, w2, w3, w4, w5)]
year | povertyRisk | eqIncome | onePerson | pWeight | w1 | w2 | w3 | w4 | w5 |
---|---|---|---|---|---|---|---|---|---|
2010 | FALSE | 16090.69 | FALSE | 504.5696 | 1013.1805463 | 0.4502254 | 1001.5595 | 1015.8425 | 0.4456781 |
2010 | FALSE | 16090.69 | FALSE | 504.5696 | 1013.1805463 | 0.4502254 | 1001.5595 | 1015.8425 | 0.4456781 |
2010 | FALSE | 16090.69 | FALSE | 504.5696 | 1013.1805463 | 0.4502254 | 1001.5595 | 1015.8425 | 0.4456781 |
2010 | FALSE | 27076.24 | FALSE | 493.3824 | 0.4413742 | 0.4409086 | 975.1408 | 994.4018 | 979.7081838 |
2010 | FALSE | 27076.24 | FALSE | 493.3824 | 0.4413742 | 0.4409086 | 975.1408 | 994.4018 | 979.7081838 |
The parameters fun
and var
in
calc.stError()
define the estimator to be used in the error
analysis. There are two built-in estimator functions
weightedSum()
and weightedRatio()
which can be
used as follows.
povertyRate <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio)
totalIncome <- calc.stError(dat_boot_calib, var = "eqIncome", fun = weightedSum)
Those functions calculate the ratio of persons at risk of poverty (in percent) and the total income. By default, the results are calculated separately for each reference period.
year | n | N | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2010 | 14827 | 8182222 | direct | 14.44422 | 0.3755538 |
2011 | 14827 | 8182222 | direct | 14.77393 | 0.2298196 |
2012 | 14827 | 8182222 | direct | 15.04515 | 0.2056325 |
2013 | 14827 | 8182222 | direct | 14.89013 | 0.4515894 |
2014 | 14827 | 8182222 | direct | 15.14556 | 0.4954098 |
2015 | 14827 | 8182222 | direct | 15.53640 | 0.5456595 |
2016 | 14827 | 8182222 | direct | 15.08315 | 0.5211549 |
2017 | 14827 | 8182222 | direct | 15.42019 | 0.3757101 |
year | n | N | estimate_type | val_eqIncome | stE_eqIncome |
---|---|---|---|---|---|
2010 | 14827 | 8182222 | direct | 162750998071 | 904175758 |
2011 | 14827 | 8182222 | direct | 161926931417 | 1229058265 |
2012 | 14827 | 8182222 | direct | 162576509628 | 1903487229 |
2013 | 14827 | 8182222 | direct | 163199507862 | 1624805090 |
2014 | 14827 | 8182222 | direct | 163986275009 | 1464839665 |
2015 | 14827 | 8182222 | direct | 163416275447 | 1665569708 |
2016 | 14827 | 8182222 | direct | 162706205137 | 2073914048 |
2017 | 14827 | 8182222 | direct | 164314959107 | 2030896610 |
Columns that use the val_
prefix denote the point
estimate belonging to the “main weight” of the dataset, which is
pWeight
in case of the dataset used here.
Columns with the stE_
prefix denote standard errors
calculated with bootstrap replicates. The replicates result in using
w1
, w2
, …, w10
instead of
pWeight
when applying the estimator.
n
denotes the number of observations for the year and
N
denotes the total weight of those persons.
In order to define a custom estimator function to be used in
fun
, the function needs to have at least two arguments like
the example below.
## define custom estimator
myWeightedSum <- function(x, w) {
sum(x*w)
}
## check if results are equal to the one using `surveysd::weightedSum()`
totalIncome2 <- calc.stError(dat_boot_calib, var = "eqIncome", fun = myWeightedSum)
all.equal(totalIncome$Estimates, totalIncome2$Estimates)
## [1] TRUE
The parameters x
and w
can be assumed to be
vectors with equal length with w
being numeric weight
vector and x
being the column defined in the
var
argument. It will be called once for each period (in
this case year
) and for each weight column (in this case
pWeight
, w1
, w2
, …,
w10
).
Custom estimators using additional parameters can also be supplied
and parameter add.arg
can be used to set the additional
arguments for the custom estimator.
## use add.arg-argument
fun <- function(x, w, b) {
sum(x*w*b)
}
add.arg = list(b="onePerson")
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = fun,
period.mean = 0, add.arg=add.arg)
err.est$Estimates
year | n | N | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2010 | 14827 | 8182222 | direct | 273683.9 | 14449.17 |
2011 | 14827 | 8182222 | direct | 261883.6 | 12029.86 |
2012 | 14827 | 8182222 | direct | 243083.9 | 13071.31 |
2013 | 14827 | 8182222 | direct | 238004.4 | 15764.40 |
2014 | 14827 | 8182222 | direct | 218572.1 | 16665.11 |
2015 | 14827 | 8182222 | direct | 219984.1 | 18322.78 |
2016 | 14827 | 8182222 | direct | 201753.9 | 14075.25 |
2017 | 14827 | 8182222 | direct | 196881.2 | 13604.54 |
# compare with direct computation
compare.value <- dat_boot_calib[,fun(povertyRisk,pWeight,b=onePerson),
by=c("year")]
all((compare.value$V1-err.est$Estimates$val_povertyRisk)==0)
## [1] TRUE
The above chunk computes the weighted poverty ratio for single person households.
In our example the variable povertyRisk
is a boolean and
is TRUE
if the income is less than 60% of the weighted
median income. Thus it directly depends on the original weight vector
pWeight
. To further reduce the estimated error one should
calculate for each bootstrap replicate weight \(w\) the weighted median income \(medIncome_{w}\) and then define \(povertyRisk_w\) as
\[ povertyRisk_w = \cases{1 \quad\text{if Income}<0.6\cdot medIncome_{w}\\ 0 \quad\text{else}} \]
The estimator can then be applied to the new variable \(povertyRisk_w\). This can be realized using a custom estimator function.
# custom estimator to first derive poverty threshold
# and then estimate a weighted ratio
povmd <- function(x, w) {
md <- laeken::weightedMedian(x, w)*0.6
pmd60 <- x < md
# weighted ratio is directly estimated inside the function
return(sum(w[pmd60])/sum(w)*100)
}
err.est <- calc.stError(
dat_boot_calib, var = "povertyRisk", fun = weightedRatio,
fun.adjust.var = povmd, adjust.var = "eqIncome")
err.est$Estimates
year | n | N | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2010 | 14827 | 8182222 | direct | 14.44422 | 0 |
2011 | 14827 | 8182222 | direct | 14.77393 | 0 |
2012 | 14827 | 8182222 | direct | 15.04515 | 0 |
2013 | 14827 | 8182222 | direct | 14.89013 | 0 |
2014 | 14827 | 8182222 | direct | 15.14556 | 0 |
2015 | 14827 | 8182222 | direct | 15.53640 | 0 |
2016 | 14827 | 8182222 | direct | 15.08315 | 0 |
2017 | 14827 | 8182222 | direct | 15.42019 | 0 |
The approach shown above is only valid if no grouping variables are
supplied (parameter group = NULL
). If grouping variables
are supplied one should use parameters fun.adjust.var
and
adjust.var
such that the \(povertyRisk_w\) is first calculated for
each period
and then used for each grouping in
group
.
# using fun.adjust.var and adjust.var to estimate povmd60 indicator
# for each period and bootstrap weight before applying the weightedRatio
povmd2 <- function(x, w) {
md <- laeken::weightedMedian(x, w)*0.6
pmd60 <- x < md
return(as.integer(pmd60))
}
# set adjust.var="eqIncome" so the income vector is used to estimate
# the povmd60 indicator for each bootstrap weight
# and the resulting indicators are passed to function weightedRatio
group <- "gender"
err.est <- calc.stError(
dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender",
fun.adjust.var = povmd2, adjust.var = "eqIncome")
err.est$Estimates
year | n | N | gender | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2010 | 7267 | 3979572 | male | direct | 12.02660 | 0.4858507 |
2010 | 7560 | 4202650 | female | direct | 16.73351 | 0.6959347 |
2010 | 14827 | 8182222 | NA | direct | 14.44422 | 0.5756880 |
2011 | 7267 | 3979572 | male | direct | 12.81921 | 0.2873416 |
2011 | 7560 | 4202650 | female | direct | 16.62488 | 0.3743578 |
2011 | 14827 | 8182222 | NA | direct | 14.77393 | 0.2694827 |
2012 | 7267 | 3979572 | male | direct | 13.76065 | 0.2865017 |
2012 | 7560 | 4202650 | female | direct | 16.26147 | 0.2689458 |
2012 | 14827 | 8182222 | NA | direct | 15.04515 | 0.1903772 |
2013 | 7267 | 3979572 | male | direct | 13.88962 | 0.4730442 |
2013 | 7560 | 4202650 | female | direct | 15.83754 | 0.1908739 |
2013 | 14827 | 8182222 | NA | direct | 14.89013 | 0.3074631 |
2014 | 7267 | 3979572 | male | direct | 14.50351 | 0.5042843 |
2014 | 7560 | 4202650 | female | direct | 15.75353 | 0.3463626 |
2014 | 14827 | 8182222 | NA | direct | 15.14556 | 0.3709321 |
2015 | 7267 | 3979572 | male | direct | 15.12289 | 0.6285688 |
2015 | 7560 | 4202650 | female | direct | 15.92796 | 0.4200607 |
2015 | 14827 | 8182222 | NA | direct | 15.53640 | 0.4914012 |
2016 | 7267 | 3979572 | male | direct | 14.57968 | 0.5546359 |
2016 | 7560 | 4202650 | female | direct | 15.55989 | 0.3072535 |
2016 | 14827 | 8182222 | NA | direct | 15.08315 | 0.4023717 |
2017 | 7267 | 3979572 | male | direct | 14.94816 | 0.4973673 |
2017 | 7560 | 4202650 | female | direct | 15.86717 | 0.6738396 |
2017 | 14827 | 8182222 | NA | direct | 15.42019 | 0.5689435 |
In case an estimator should be applied to several columns of the
dataset, var
can be set to a vector containing all
necessary columns.
multipleRates <- calc.stError(dat_boot_calib, var = c("povertyRisk", "onePerson"), fun = weightedRatio)
multipleRates$Estimates
year | n | N | estimate_type | val_povertyRisk | stE_povertyRisk | val_onePerson | stE_onePerson |
---|---|---|---|---|---|---|---|
2010 | 14827 | 8182222 | direct | 14.44422 | 0.3942534 | 14.85737 | 0.3942534 |
2011 | 14827 | 8182222 | direct | 14.77393 | 0.3043969 | 14.85737 | 0.3043969 |
2012 | 14827 | 8182222 | direct | 15.04515 | 0.2895304 | 14.85737 | 0.2895304 |
2013 | 14827 | 8182222 | direct | 14.89013 | 0.3950952 | 14.85737 | 0.3950952 |
2014 | 14827 | 8182222 | direct | 15.14556 | 0.4561354 | 14.85737 | 0.4561354 |
2015 | 14827 | 8182222 | direct | 15.53640 | 0.6039997 | 14.85737 | 0.6039997 |
2016 | 14827 | 8182222 | direct | 15.08315 | 0.5295194 | 14.85737 | 0.5295194 |
2017 | 14827 | 8182222 | direct | 15.42019 | 0.6276176 | 14.85737 | 0.6276176 |
Here we see the relative number of persons at risk of poverty and the relative number of one-person households.
The groups
argument can be used to calculate estimators
for different subsets of the data. This argument can take the grouping
variable as a string that refers to a column name (usually a factor) in
dat
. If set, all estimators are not only split by the
reference period but also by the grouping variable. For simplicity, only
one reference period of the above data is used.
dat2 <- subset(dat_boot_calib, year == 2010)
for (att in c("period", "weights", "b.rep"))
attr(dat2, att) <- attr(dat_boot_calib, att)
To calculate the ratio of persons at risk of poverty for each federal
state of Austria, group = "region"
can be used.
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio, group = "region")
povertyRates$Estimates
year | n | N | region | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2010 | 549 | 260564 | Burgenland | direct | 19.53984 | 3.8201446 |
2010 | 733 | 377355 | Vorarlberg | direct | 16.53731 | 3.4361247 |
2010 | 924 | 535451 | Salzburg | direct | 13.78734 | 1.8458914 |
2010 | 1078 | 563648 | Carinthia | direct | 13.08627 | 2.0096038 |
2010 | 1317 | 701899 | Tyrol | direct | 15.30819 | 1.8293976 |
2010 | 2295 | 1167045 | Styria | direct | 14.37464 | 1.0559605 |
2010 | 2322 | 1598931 | Vienna | direct | 17.23468 | 1.1871171 |
2010 | 2804 | 1555709 | Lower Austria | direct | 13.84362 | 1.1256995 |
2010 | 2805 | 1421620 | Upper Austria | direct | 10.88977 | 0.9377872 |
2010 | 14827 | 8182222 | NA | direct | 14.44422 | 0.3755538 |
The last row with region = NA
denotes the aggregate over
all regions. Note that the columns N
and n
now
show the weighted and unweighted number of persons in each region.
In case more than one grouping variable is used, there are several
options of calling calc.stError()
depending on whether
combinations of grouping levels should be regarded or not. We will
consider the variables gender
and region
as
our grouping variables and show three options on how
calc.stError()
can be called.
Calculate the point estimate and standard error for each region and each gender. The number of rows in the output is therefore
\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12.\]
The last row is again the estimate for the whole period.
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = c("gender", "region"))
povertyRates$Estimates
year | n | N | gender | region | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|---|
2010 | 549 | 260564 | NA | Burgenland | direct | 19.53984 | 3.8201446 |
2010 | 733 | 377355 | NA | Vorarlberg | direct | 16.53731 | 3.4361247 |
2010 | 924 | 535451 | NA | Salzburg | direct | 13.78734 | 1.8458914 |
2010 | 1078 | 563648 | NA | Carinthia | direct | 13.08627 | 2.0096038 |
2010 | 1317 | 701899 | NA | Tyrol | direct | 15.30819 | 1.8293976 |
2010 | 2295 | 1167045 | NA | Styria | direct | 14.37464 | 1.0559605 |
2010 | 2322 | 1598931 | NA | Vienna | direct | 17.23468 | 1.1871171 |
2010 | 2804 | 1555709 | NA | Lower Austria | direct | 13.84362 | 1.1256995 |
2010 | 2805 | 1421620 | NA | Upper Austria | direct | 10.88977 | 0.9377872 |
2010 | 7267 | 3979572 | male | NA | direct | 12.02660 | 0.3524528 |
2010 | 7560 | 4202650 | female | NA | direct | 16.73351 | 0.4706546 |
2010 | 14827 | 8182222 | NA | NA | direct | 14.44422 | 0.3755538 |
region
and
gender
Split the data by all combinations of the two grouping variables. This will result in a larger output-table of the size
\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + 1) = 1\cdot(9\cdot2 + 1)= 19.\]
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = list(c("gender", "region")))
povertyRates$Estimates
year | n | N | gender | region | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|---|
2010 | 261 | 122741.8 | male | Burgenland | direct | 17.414524 | 3.7608324 |
2010 | 288 | 137822.2 | female | Burgenland | direct | 21.432598 | 4.1870089 |
2010 | 359 | 182732.9 | male | Vorarlberg | direct | 12.973259 | 3.7597884 |
2010 | 374 | 194622.1 | female | Vorarlberg | direct | 19.883637 | 3.8858420 |
2010 | 440 | 253143.7 | male | Salzburg | direct | 9.156964 | 1.9057526 |
2010 | 484 | 282307.3 | female | Salzburg | direct | 17.939382 | 2.1700685 |
2010 | 517 | 268581.4 | male | Carinthia | direct | 10.552148 | 1.6181769 |
2010 | 561 | 295066.6 | female | Carinthia | direct | 15.392924 | 2.6174006 |
2010 | 650 | 339566.5 | male | Tyrol | direct | 12.857542 | 2.2710099 |
2010 | 667 | 362332.5 | female | Tyrol | direct | 17.604861 | 1.6905978 |
2010 | 1128 | 571011.7 | male | Styria | direct | 11.671247 | 1.2206931 |
2010 | 1132 | 774405.4 | male | Vienna | direct | 15.590616 | 1.0566260 |
2010 | 1167 | 596033.3 | female | Styria | direct | 16.964539 | 1.2797304 |
2010 | 1190 | 824525.6 | female | Vienna | direct | 18.778813 | 1.4814944 |
2010 | 1363 | 684272.5 | male | Upper Austria | direct | 9.074690 | 0.9212612 |
2010 | 1387 | 772593.2 | female | Lower Austria | direct | 16.372949 | 1.2166483 |
2010 | 1417 | 783115.8 | male | Lower Austria | direct | 11.348283 | 1.1851814 |
2010 | 1442 | 737347.5 | female | Upper Austria | direct | 12.574205 | 1.1522839 |
2010 | 14827 | 8182222.0 | NA | NA | direct | 14.444218 | 0.3755538 |
In this case, the estimates and standard errors are calculated for
The number of rows in the output is therefore
\[n_\text{periods}\cdot(n_\text{regions} \cdot n_\text{genders} + n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9\cdot2 + 9 + 2 + 1) = 30.\]
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = list("gender", "region", c("gender", "region")))
povertyRates$Estimates
year | n | N | gender | region | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|---|
2010 | 261 | 122741.8 | male | Burgenland | direct | 17.414524 | 3.7608324 |
2010 | 288 | 137822.2 | female | Burgenland | direct | 21.432598 | 4.1870089 |
2010 | 359 | 182732.9 | male | Vorarlberg | direct | 12.973259 | 3.7597884 |
2010 | 374 | 194622.1 | female | Vorarlberg | direct | 19.883637 | 3.8858420 |
2010 | 440 | 253143.7 | male | Salzburg | direct | 9.156964 | 1.9057526 |
2010 | 484 | 282307.3 | female | Salzburg | direct | 17.939382 | 2.1700685 |
2010 | 517 | 268581.4 | male | Carinthia | direct | 10.552148 | 1.6181769 |
2010 | 549 | 260564.0 | NA | Burgenland | direct | 19.539836 | 3.8201446 |
2010 | 561 | 295066.6 | female | Carinthia | direct | 15.392924 | 2.6174006 |
2010 | 650 | 339566.5 | male | Tyrol | direct | 12.857542 | 2.2710099 |
2010 | 667 | 362332.5 | female | Tyrol | direct | 17.604861 | 1.6905978 |
2010 | 733 | 377355.0 | NA | Vorarlberg | direct | 16.537310 | 3.4361247 |
2010 | 924 | 535451.0 | NA | Salzburg | direct | 13.787343 | 1.8458914 |
2010 | 1078 | 563648.0 | NA | Carinthia | direct | 13.086268 | 2.0096038 |
2010 | 1128 | 571011.7 | male | Styria | direct | 11.671247 | 1.2206931 |
2010 | 1132 | 774405.4 | male | Vienna | direct | 15.590616 | 1.0566260 |
2010 | 1167 | 596033.3 | female | Styria | direct | 16.964539 | 1.2797304 |
2010 | 1190 | 824525.6 | female | Vienna | direct | 18.778813 | 1.4814944 |
2010 | 1317 | 701899.0 | NA | Tyrol | direct | 15.308191 | 1.8293976 |
2010 | 1363 | 684272.5 | male | Upper Austria | direct | 9.074690 | 0.9212612 |
2010 | 1387 | 772593.2 | female | Lower Austria | direct | 16.372949 | 1.2166483 |
2010 | 1417 | 783115.8 | male | Lower Austria | direct | 11.348283 | 1.1851814 |
2010 | 1442 | 737347.5 | female | Upper Austria | direct | 12.574205 | 1.1522839 |
2010 | 2295 | 1167045.0 | NA | Styria | direct | 14.374637 | 1.0559605 |
2010 | 2322 | 1598931.0 | NA | Vienna | direct | 17.234683 | 1.1871171 |
2010 | 2804 | 1555709.0 | NA | Lower Austria | direct | 13.843623 | 1.1256995 |
2010 | 2805 | 1421620.0 | NA | Upper Austria | direct | 10.889773 | 0.9377872 |
2010 | 7267 | 3979571.7 | male | NA | direct | 12.026600 | 0.3524528 |
2010 | 7560 | 4202650.3 | female | NA | direct | 16.733508 | 0.4706546 |
2010 | 14827 | 8182222.0 | NA | NA | direct | 14.444218 | 0.3755538 |
If differences between groups need to be calculated, e.g difference
of poverty rates between gender = "male"
and
gender = "female"
, parameter group.diff
can be
utilised. Setting group.diff = TRUE
the differences and the
standard error of these differences for all variables defined in
groups
will be calculated.
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = c("gender", "region"),
group.diff = TRUE)
povertyRates$Estimates
year | n | N | gender | region | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|---|
2010 | 549.0 | 260564.0 | NA | Burgenland | direct | 19.5398365 | 3.8201446 |
2010 | 641.0 | 318959.5 | NA | Burgenland - Vorarlberg | group difference | 3.0025263 | 4.3512326 |
2010 | 733.0 | 377355.0 | NA | Vorarlberg | direct | 16.5373102 | 3.4361247 |
2010 | 736.5 | 398007.5 | NA | Burgenland - Salzburg | group difference | 5.7524933 | 4.0727197 |
2010 | 813.5 | 412106.0 | NA | Burgenland - Carinthia | group difference | 6.4535688 | 4.2042780 |
2010 | 828.5 | 456403.0 | NA | Salzburg - Vorarlberg | group difference | -2.7499670 | 3.5203199 |
2010 | 905.5 | 470501.5 | NA | Carinthia - Vorarlberg | group difference | -3.4510424 | 3.7663942 |
2010 | 924.0 | 535451.0 | NA | Salzburg | direct | 13.7873432 | 1.8458914 |
2010 | 933.0 | 481231.5 | NA | Burgenland - Tyrol | group difference | 4.2316460 | 4.2130768 |
2010 | 1001.0 | 549549.5 | NA | Carinthia - Salzburg | group difference | -0.7010755 | 2.6242885 |
2010 | 1025.0 | 539627.0 | NA | Tyrol - Vorarlberg | group difference | -1.2291197 | 4.6030925 |
2010 | 1078.0 | 563648.0 | NA | Carinthia | direct | 13.0862677 | 2.0096038 |
2010 | 1120.5 | 618675.0 | NA | Salzburg - Tyrol | group difference | -1.5208473 | 2.3646678 |
2010 | 1197.5 | 632773.5 | NA | Carinthia - Tyrol | group difference | -2.2219227 | 3.0161373 |
2010 | 1317.0 | 701899.0 | NA | Tyrol | direct | 15.3081905 | 1.8293976 |
2010 | 1422.0 | 713804.5 | NA | Burgenland - Styria | group difference | 5.1651992 | 4.6852127 |
2010 | 1435.5 | 929747.5 | NA | Burgenland - Vienna | group difference | 2.3051533 | 3.8296985 |
2010 | 1514.0 | 772200.0 | NA | Styria - Vorarlberg | group difference | -2.1626729 | 4.1234841 |
2010 | 1527.5 | 988143.0 | NA | Vienna - Vorarlberg | group difference | 0.6973730 | 3.8746662 |
2010 | 1609.5 | 851248.0 | NA | Salzburg - Styria | group difference | -0.5872941 | 2.1351288 |
2010 | 1623.0 | 1067191.0 | NA | Salzburg - Vienna | group difference | -3.4473400 | 1.9229290 |
2010 | 1676.5 | 908136.5 | NA | Burgenland - Lower Austria | group difference | 5.6962137 | 4.4195868 |
2010 | 1677.0 | 841092.0 | NA | Burgenland - Upper Austria | group difference | 8.6500631 | 3.8010158 |
2010 | 1686.5 | 865346.5 | NA | Carinthia - Styria | group difference | -1.2883695 | 2.2730489 |
2010 | 1700.0 | 1081289.5 | NA | Carinthia - Vienna | group difference | -4.1484155 | 2.5062449 |
2010 | 1768.5 | 966532.0 | NA | Lower Austria - Vorarlberg | group difference | -2.6936874 | 3.4123823 |
2010 | 1769.0 | 899487.5 | NA | Upper Austria - Vorarlberg | group difference | -5.6475368 | 3.5482126 |
2010 | 1806.0 | 934472.0 | NA | Styria - Tyrol | group difference | -0.9335532 | 1.8749174 |
2010 | 1819.5 | 1150415.0 | NA | Tyrol - Vienna | group difference | -1.9264927 | 1.7045409 |
2010 | 1864.0 | 1045580.0 | NA | Lower Austria - Salzburg | group difference | 0.0562796 | 1.8470515 |
2010 | 1864.5 | 978535.5 | NA | Salzburg - Upper Austria | group difference | 2.8975698 | 2.2161464 |
2010 | 1941.0 | 1059678.5 | NA | Carinthia - Lower Austria | group difference | -0.7573551 | 2.0484059 |
2010 | 1941.5 | 992634.0 | NA | Carinthia - Upper Austria | group difference | 2.1964944 | 2.4497219 |
2010 | 2060.5 | 1128804.0 | NA | Lower Austria - Tyrol | group difference | -1.4645677 | 2.5759089 |
2010 | 2061.0 | 1061759.5 | NA | Tyrol - Upper Austria | group difference | 4.4184171 | 1.6771042 |
2010 | 2295.0 | 1167045.0 | NA | Styria | direct | 14.3746373 | 1.0559605 |
2010 | 2308.5 | 1382988.0 | NA | Styria - Vienna | group difference | -2.8600459 | 1.6846330 |
2010 | 2322.0 | 1598931.0 | NA | Vienna | direct | 17.2346832 | 1.1871171 |
2010 | 2549.5 | 1361377.0 | NA | Lower Austria - Styria | group difference | -0.5310145 | 1.2602200 |
2010 | 2550.0 | 1294332.5 | NA | Styria - Upper Austria | group difference | 3.4848639 | 1.6332932 |
2010 | 2563.0 | 1577320.0 | NA | Lower Austria - Vienna | group difference | -3.3910604 | 1.9532508 |
2010 | 2563.5 | 1510275.5 | NA | Upper Austria - Vienna | group difference | -6.3449098 | 1.4178877 |
2010 | 2804.0 | 1555709.0 | NA | Lower Austria | direct | 13.8436228 | 1.1256995 |
2010 | 2804.5 | 1488664.5 | NA | Lower Austria - Upper Austria | group difference | 2.9538494 | 1.9538722 |
2010 | 2805.0 | 1421620.0 | NA | Upper Austria | direct | 10.8897734 | 0.9377872 |
2010 | 7267.0 | 3979571.7 | male | NA | direct | 12.0266000 | 0.3524528 |
2010 | 7413.5 | 4091111.0 | male - female | NA | group difference | -4.7069081 | 0.3539490 |
2010 | 7560.0 | 4202650.3 | female | NA | direct | 16.7335081 | 0.4706546 |
2010 | 14827.0 | 8182222.0 | NA | NA | direct | 14.4442182 | 0.3755538 |
The resulting output table contains 49 rows. 12 rows for all the direct estimators
\[n_\text{periods}\cdot(n_\text{regions} + n_\text{genders} + 1) = 1\cdot(9 + 2 + 1) = 12,\]
and another 37 for all the differences within the variable
"gender"
and "region"
seperately. Variable
"gender"
has 2 unique values
(unique(dat2$gender)
) resulting in 1 difference, ~
gender = "male"
- gender = "female"
and
variable "region"
has 9 unique values
(unique(dat2$region)
) resulting in
\[8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 = \sum\limits_{1=1}^{9-1}i = 36\]
estimates. Thus the output contains 1 + 36 = 37 estimates with respect to group differences.
If a combintaion of grouping variables is used in group
and group.diff = TRUE
then differences between combinations
will only be calculated if one of the grouping variables differs. For
example the difference between the following groups would be
calculated
gender = "female" & region = "Vienna"
-
gender = "male" & region = "Vienna"
gender = "female" & region = "Vienna"
-
gender = "female" & region = "Salzburg"
gender = "male" & region = "Salzburg"
-
gender = "female" & region = "Salzburg"
The difference between
gender = "female" & region = "Vienna"
and
gender = "male" & region = "Salzburg"
however would not
be calculated.
Thus this leads to
\[2\cdot(\sum\limits_{1=1}^{9-1}i) + 9\cdot1 = 81\]
results with respect to the differences. The Output contains an
additional column estimate_type
and
povertyRates <- calc.stError(dat2, var = "povertyRisk", fun = weightedRatio,
group = list(c("gender", "region")),
group.diff = TRUE)
povertyRates$Estimates[,.N,by=.(estimate_type)]
estimate_type | N |
---|---|
direct | 19 |
group difference | 81 |
Differences of estimates between period
s can be
calculated using parameter period.diff
.
period.diff
expects a character vector (if not
NULL
) specifying for which period
s the
differences should be calcualed for. The inputs should be specified in
the form "period2" - "period1"
.
povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio,
period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates
year | n | N | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2014 | 14827 | 8182222 | direct | 15.1455601 | 0.4954098 |
2015 | 14827 | 8182222 | direct | 15.5364014 | 0.5456595 |
2015-2014 | 14827 | 8182222 | period difference | 0.3908413 | 0.3505833 |
2016 | 14827 | 8182222 | direct | 15.0831502 | 0.5211549 |
2016-2015 | 14827 | 8182222 | period difference | -0.4532512 | 0.3818030 |
2017 | 14827 | 8182222 | direct | 15.4201916 | 0.3757101 |
2017-2016 | 14827 | 8182222 | period difference | 0.3370414 | 0.4140711 |
If additional grouping variables are supplied to
calc.stError()
die differences across period
s
are also carried out for all variables in group
.
povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio,
group = "gender",
period.diff = c("2017 - 2016", "2016 - 2015", "2015 - 2014"))
povertyRates$Estimates
year | n | N | gender | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2014 | 7267 | 3979572 | male | direct | 14.5035068 | 0.6086880 |
2014 | 7560 | 4202650 | female | direct | 15.7535328 | 0.4769638 |
2014 | 14827 | 8182222 | NA | direct | 15.1455601 | 0.4954098 |
2015 | 7267 | 3979572 | male | direct | 15.1228904 | 0.6391846 |
2015 | 7560 | 4202650 | female | direct | 15.9279630 | 0.5250614 |
2015 | 14827 | 8182222 | NA | direct | 15.5364014 | 0.5456595 |
2015-2014 | 7267 | 3979572 | male | period difference | 0.6193836 | 0.3561700 |
2015-2014 | 7560 | 4202650 | female | period difference | 0.1744301 | 0.3658346 |
2015-2014 | 14827 | 8182222 | NA | period difference | 0.3908413 | 0.3505833 |
2016 | 7267 | 3979572 | male | direct | 14.5796824 | 0.5975064 |
2016 | 7560 | 4202650 | female | direct | 15.5598937 | 0.5005551 |
2016 | 14827 | 8182222 | NA | direct | 15.0831502 | 0.5211549 |
2016-2015 | 7267 | 3979572 | male | period difference | -0.5432080 | 0.3532349 |
2016-2015 | 7560 | 4202650 | female | period difference | -0.3680693 | 0.4613847 |
2016-2015 | 14827 | 8182222 | NA | period difference | -0.4532512 | 0.3818030 |
2017 | 7267 | 3979572 | male | direct | 14.9481591 | 0.3562568 |
2017 | 7560 | 4202650 | female | direct | 15.8671684 | 0.4535615 |
2017 | 14827 | 8182222 | NA | direct | 15.4201916 | 0.3757101 |
2017-2016 | 7267 | 3979572 | male | period difference | 0.3684767 | 0.4827706 |
2017-2016 | 7560 | 4202650 | female | period difference | 0.3072748 | 0.4544777 |
2017-2016 | 14827 | 8182222 | NA | period difference | 0.3370414 | 0.4140711 |
With parameter period.mean
averages across
period
s are calculated additional. The parameter accepts
only odd integer values. The resulting table will contain the direct
estimates as well as rolling averages of length
period.mean
.
povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio,
period.mean = 3)
povertyRates$Estimates
year | n | N | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2014 | 14827 | 8182222 | direct | 15.14556 | 0.4954098 |
2014_2015_2016 | 14827 | 8182222 | period average | 15.25504 | 0.4615078 |
2015 | 14827 | 8182222 | direct | 15.53640 | 0.5456595 |
2015_2016_2017 | 14827 | 8182222 | period average | 15.34658 | 0.4211127 |
2016 | 14827 | 8182222 | direct | 15.08315 | 0.5211549 |
2017 | 14827 | 8182222 | direct | 15.42019 | 0.3757101 |
if in addition the parameters group
and/or
period.diff
are specified then differences and groupings of
averages will be calculated.
povertyRates <- calc.stError(dat_boot_calib[year>2013], var = "povertyRisk", fun = weightedRatio,
period.mean = 3, period.diff = "2016 - 2015",
group = "gender")
povertyRates$Estimates
year | n | N | gender | estimate_type | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2014 | 7267 | 3979572 | male | direct | 14.5035068 | 0.6086880 |
2014 | 7560 | 4202650 | female | direct | 15.7535328 | 0.4769638 |
2014 | 14827 | 8182222 | NA | direct | 15.1455601 | 0.4954098 |
2014_2015_2016 | 7267 | 3979572 | male | period average | 14.7353599 | 0.5710507 |
2014_2015_2016 | 7560 | 4202650 | female | period average | 15.7471298 | 0.4185309 |
2014_2015_2016 | 14827 | 8182222 | NA | period average | 15.2550372 | 0.4615078 |
2015 | 7267 | 3979572 | male | direct | 15.1228904 | 0.6391846 |
2015 | 7560 | 4202650 | female | direct | 15.9279630 | 0.5250614 |
2015 | 14827 | 8182222 | NA | direct | 15.5364014 | 0.5456595 |
2015_2016_2017 | 7267 | 3979572 | male | period average | 14.8835773 | 0.4784864 |
2015_2016_2017 | 7560 | 4202650 | female | period average | 15.7850084 | 0.3963737 |
2015_2016_2017 | 14827 | 8182222 | NA | period average | 15.3465811 | 0.4211127 |
2016 | 7267 | 3979572 | male | direct | 14.5796824 | 0.5975064 |
2016 | 7560 | 4202650 | female | direct | 15.5598937 | 0.5005551 |
2016 | 14827 | 8182222 | NA | direct | 15.0831502 | 0.5211549 |
2016-2015 | 7267 | 3979572 | male | period difference | -0.5432080 | 0.3532349 |
2016-2015 | 7560 | 4202650 | female | period difference | -0.3680693 | 0.4613847 |
2016-2015 | 14827 | 8182222 | NA | period difference | -0.4532512 | 0.3818030 |
2016-2015_mean | 7267 | 3979572 | male | difference between period averages | 0.1482174 | 0.1669658 |
2016-2015_mean | 7560 | 4202650 | female | difference between period averages | 0.0378785 | 0.2406335 |
2016-2015_mean | 14827 | 8182222 | NA | difference between period averages | 0.0915438 | 0.1818219 |
2017 | 7267 | 3979572 | male | direct | 14.9481591 | 0.3562568 |
2017 | 7560 | 4202650 | female | direct | 15.8671684 | 0.4535615 |
2017 | 14827 | 8182222 | NA | direct | 15.4201916 | 0.3757101 |