nonprobsvy News and Updates
control_out(eps=1e-8)method_nn and method_pmmmethod_glmextract added which allows to extract
results from the nonprob objectcoef added which allows to obtain the
coefficients of underlying models (if possible)cloglog)sampling package from suggested packageplot methodcheck_balance error (closes #75)pop.size, controlSel,
controlOut and controlInf were renamed to
pop_size, control_sel,
control_out and control_inf respectively.genSimData removed completely as it is not
used anywhere in the package.maxLik_method renamed to
maxlik_method in the control_sel
function.control_out function:
predictive_match renamed to pmm_match_type
to align with the PMM (Predictive Mean Matching) estimator naming
convention, where all related parameters start with
pmm_control_sel function:
method removed as it was not usedest_method_sel renamed to
est_methodh renamed to gee_h_fun to make
this more readable to the userstart_type now accepts only zero and
mle (for gee models only).control_inf function:
bias_inf renamed to vars_combine and type
changed to logical. TRUE if variables (its
levels) should be combined after variable selection algorithm for the
doubly robust approach.pi_ij – argument removed as it is not used.nonprobsvy class renamed to nonprob and
all related method adjusted to this changelogit_model_nonprobsvy,
probit_model_nonprobsvy and
cloglog_model_nonprobsvy removed in the favour of more
readable method_ps function that specifies the propensity
score modelcontrol_inference=control_inf(vars_combine=TRUE) which
allows doubly robust estimator to combine variables prior estimation
i.e. if selection=~x1+x2 and y~x1+x3 then the
following models are fitted selection=~x1+x2+x3 and
y~x1+x2+x3. By default we set
control_inference=control_inf(vars_combine=FALSE). Note
that this behaviour is assumed independently from variable
selection.nonprob(weights=NULL) replaced to
nonprob(case_weights=NULL) to stress that this refer to
case weights not sampling or other weights in non-probability
samplejvs (Job
Vacancy Survey; a probability sample survey) and admin
(Central Job Offers Database; a non-probability sample survey). The
units and auxiliary variables have been aligned in a way that allows the
data to be integrated using the methods implemented in this
package.check_balance function was added to check the balance
in the totals of the variables based on the weighted weights between the
non-probability and probability samples.na_action with default
na.omitweights – returns IPW weightsupdate – allows to update the nonprob
class objectmethod_ps – for modelling propensity scoremethod_glm – for modelling y using glm
functionmethod_nn – for the NN methodmethod_pmm – for the PMM methodmethod_npar – for the non-parametric methodprint.nonprob, summary.nonprob and
print.nonprob_summary methods> result_mi
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.1817
- variable y2: 1.8087
- selected estimators:
- variable y1: 2.9498 (se=0.0420, ci=(2.8674, 3.0322))
- variable y2: 1.5760 (se=0.0326, ci=(1.5122, 1.6399))number of digits can be changed using print(x, digits)
as shown below
> print(result_mi,2)
A nonprob object
- estimator type: mass imputation
- method: glm (gaussian)
- auxiliary variables source: survey
- vars selection: false
- variance estimator: analytic
- population size fixed: false
- naive (uncorrected) estimators:
- variable y1: 3.18
- variable y2: 1.81
- selected estimators:
- variable y1: 2.95 (se=0.04, ci=(2.87, 3.03))
- variable y2: 1.58 (se=0.03, ci=(1.51, 1.64))> summary(result_mi) |> print(digits=2)
A nonprob_summary object
- call: nonprob(data = subset(population, flag_bd1 == 1), outcome = y1 +
y2 ~ x1 + x2, svydesign = sample_prob)
- estimator type: mass imputation
- nonprob sample size: 693011 (69.3%)
- prob sample size: 1000 (0.1%)
- population size: 1000000 (fixed: false)
- detailed information about models are stored in list element(s): "outcome"
----------------------------------------------------------------
- distribution of outcome residuals:
- y1: min: -4.79; mean: 0.00; median: 0.00; max: 4.54
- y2: min: -4.96; mean: -0.00; median: -0.07; max: 12.25
- distribution of outcome predictions (nonprob sample):
- y1: min: -2.72; mean: 3.18; median: 3.04; max: 16.28
- y2: min: -1.55; mean: 1.81; median: 1.58; max: 13.92
- distribution of outcome predictions (prob sample):
- y1: min: -0.46; mean: 2.95; median: 2.84; max: 10.31
- y2: min: -0.58; mean: 1.58; median: 1.39; max: 7.87
----------------------------------------------------------------formula.toolsstrata is not
supported for the time being.maxit argument from
controlSel function to internally used nleqslv
functionvector in
model_frame when predicting y_hat in mass
imputation glm model when X is based in one auxiliary
variable only - fix provided converting it to data.frame
object.summary about quality of
estimation basing on difference between estimated and known total values
of auxiliary variablescontrolOut function by
switching values for predictive_match argument. From now
on, the predictive_match = 1 means \(\hat{y}-\hat{y}\) in predictive mean
matching imputation and predictive_match = 2 corresponds to
\(\hat{y}-y\) matching.div option when variable selection (more in
documentation) for doubly robust estimation.nonprob output such as gradient,
hessian and jacobian derived from IPW estimation for mle
and gee methods when IPW or DR
model executed.nonprob output
when IPW or DR model executed.model_frame matrix data from probability sample
used for mass imputation to nonprob when MI or
DR model executed.logit, complementary log-log and
probit link functions.generalized linear models,
nearest neighbours and
predictive mean matching methods for Mass ImputationSCAD,
LASSO and MCP penalization equationsanalytic and bootstrap (with
parallel computation - doSNOW package) variance for
described estimatorsnonprob class such as
nobs for samples sizepop.size for population size estimationresiduals for residuals of the inverse probability
weighting modelcooks.distance for identifying influential observations
that have a significant impact on the parameter estimateshatvalues for measuring the leverage of individual
observationslogLik for computing the log-likelihood of the
model,AIC (Akaike Information Criterion) for evaluating the
model based on the trade-off between goodness of fit and complexity,
helping in model selectionBIC (Bayesian Information Criterion) for a similar
purpose as AIC but with a stronger penalty for model complexityconfint for calculating confidence intervals around
parameter estimatesvcov for obtaining the variance-covariance matrix of
the parameter estimatesdeviance for assessing the goodness of fit of the
modelR-cmd checknonprob function.