Fuzzy difference-in-differences with Rfuzzydid

Purpose

This article is a practical guide to the estimators implemented in Rfuzzydid. It follows de Chaisemartin and D’Haultfoeuille (2018) and the Stata companion by de Chaisemartin, D’Haultfoeuille, and Guyonvarch (2019). Those papers are the primary references for the formal identification results; this vignette keeps the focus on what to run and how to read the output.

A fuzzy DID design has the same comparison structure as a standard DID design, but the treatment group does not necessarily move from untreated to fully treated. Instead, the treatment rate increases more in one group than in the comparison group. The first stage is the DID in treatment:

\[ \Delta_D = E[D\mid G=1,T=1] - E[D\mid G=1,T=0] - E[D\mid G=0,T=1] + E[D\mid G=0,T=0]. \]

The estimators in Rfuzzydid scale outcome changes by this first stage. In the binary-treatment, two-group, two-period case, the target is a local average treatment effect for switchers: units whose treatment status changes because their group-period cell receives the stronger treatment-intensity change.

Choosing an estimator

fuzzydid() exposes the main estimators from the papers through logical options. You can request more than one estimator in the same call.

Option Estimator Use when
did = TRUE Wald-DID You want the familiar DID Wald ratio as a benchmark, and the stronger treatment-effect stability restrictions are credible.
tc = TRUE Wald-TC You want an alternative LATE estimator that uses common-trend restrictions within baseline-treatment subgroups and does not require stable treatment effects.
cic = TRUE Wald-CIC You want the fuzzy extension of changes-in-changes, using distributional restrictions rather than a mean-only common-trends restriction.
lqte = TRUE LQTE You want local quantile treatment effects; this uses the same style of restrictions as Wald-CIC.

The Wald-DID ratio is easy to interpret, but the papers show that it needs extra restrictions on treatment effects to identify the LATE in fuzzy designs. Wald-TC and Wald-CIC are useful alternatives when those restrictions are not plausible. They rely on different identifying assumptions, so disagreement between estimators is substantively informative rather than just a numerical detail.

The Stata companion recommends using equality tests and placebo-style numerator checks when the design allows them. In Rfuzzydid, use eqtest = TRUE when at least two LATE estimators are requested. Use numerator = TRUE for reduced-form placebo checks in periods where the treatment first stage should be zero.

Basic R workflow

The formula is outcome ~ treatment + covariates. The group and time arguments name the group and period variables.

Internally, fuzzydid() parses the formula into one numeric outcome, one numeric treatment, and optional covariates. Numeric covariates are treated as continuous predictors. Factor, character, and logical covariates are treated as qualitative predictors and expanded to indicator columns with the first sorted level omitted. With sieves = TRUE, continuous covariates are expanded to polynomial basis terms; when sieveorder = NULL, the order is selected by a deterministic five-fold cross-validation rule.

Rows with NA or NaN in analysis variables are removed by complete-case filtering. Inf and -Inf are rejected because they are not meaningful support points for the empirical means and quantile maps. Use tagobs = TRUE to return the retained-row mask.

library(Rfuzzydid)

make_cell <- function(g, t, n, p_d) {
  d <- rbinom(n, size = 1, prob = p_d)
  y <- 1 + 0.4 * g + 0.3 * t + 1.5 * d + rnorm(n, sd = 0.2)
  data.frame(y = y, d = d, g = g, t = t)
}

set.seed(4)
df <- rbind(
  make_cell(g = 0, t = 0, n = 80, p_d = 0.20),
  make_cell(g = 0, t = 1, n = 80, p_d = 0.30),
  make_cell(g = 1, t = 0, n = 80, p_d = 0.25),
  make_cell(g = 1, t = 1, n = 80, p_d = 0.70)
)

fit <- fuzzydid(
  data = df,
  formula = y ~ d,
  group = "g",
  time = "t",
  did = TRUE,
  tc = TRUE,
  cic = TRUE,
  eqtest = TRUE,
  breps = 50,
  seed = 1
)

summary(fit)
#> LATE estimators
#> 
#> 
#> |estimator | estimate| std.error| conf.low| conf.high|
#> |:---------|--------:|---------:|--------:|---------:|
#> |W_DID     | 1.548061| 0.1262303| 1.360954|  1.826263|
#> |W_TC      | 1.539913| 0.0907961| 1.381874|  1.714579|
#> |W_CIC     | 1.652100| 0.0805602| 1.537000|  1.832401|
#> 
#> Equality tests
#> 
#> 
#> |contrast    |   estimate| std.error|   conf.low|  conf.high|
#> |:-----------|----------:|---------:|----------:|----------:|
#> |W_DID_W_TC  |  0.0081482| 0.0463553| -0.0383010|  0.1217442|
#> |W_DID_W_CIC | -0.1040394| 0.0803680| -0.2734092|  0.0606554|
#> |W_TC_W_CIC  | -0.1121876| 0.0538657| -0.2443673| -0.0472058|

The late component contains point estimates, bootstrap standard errors, and percentile bootstrap confidence intervals. The matrices component mirrors the matrix-style results returned by the Stata command.

fit$late
#>   estimator estimate  std.error conf.low conf.high
#> 1     W_DID 1.548061 0.12623028 1.360954  1.826263
#> 2      W_TC 1.539913 0.09079608 1.381874  1.714579
#> 3     W_CIC 1.652100 0.08056021 1.537000  1.832401
fit$eqtest
#>      contrast    estimate  std.error    conf.low   conf.high
#> 1  W_DID_W_TC  0.00814816 0.04635526 -0.03830102  0.12174418
#> 2 W_DID_W_CIC -0.10403941 0.08036804 -0.27340920  0.06065538
#> 3  W_TC_W_CIC -0.11218757 0.05386572 -0.24436730 -0.04720579

Reading the main options

Treatment categories

For ordered or multi-valued treatments, newcateg groups treatment values into ordered bins before estimating Wald-TC or Wald-CIC:

fuzzydid(
  data = df,
  formula = wage ~ schooling,
  group = "g",
  time = "t",
  tc = TRUE,
  cic = TRUE,
  newcateg = c(5, 8, 11, 14, 1000)
)

This follows the Stata article’s treatment of applications where the treatment has many support points.

Partial identification

When the comparison group does not have a stable treatment rate, the Wald-TC point estimand may be unavailable. The papers derive bounds for that setting, and partial = TRUE requests the corresponding TC bounds:

fuzzydid(
  data = df,
  formula = y ~ d,
  group = "g",
  time = "t",
  tc = TRUE,
  partial = TRUE,
  breps = 50,
  seed = 1
)

Covariates

When the identifying restrictions are more credible after conditioning on covariates, include those variables on the right-hand side of the formula. modelx controls the parametric first-step models used for Wald-DID and Wald-TC. With binary treatment, supply two entries: one for outcome conditional expectations and one for treatment conditional expectations.

fuzzydid(
  data = df,
  formula = y ~ d + x1 + x2,
  group = "g",
  time = "t",
  did = TRUE,
  tc = TRUE,
  modelx = c("ols", "logit")
)

For continuous controls, sieves = TRUE requests nonparametric sieve adjustment. If sieveorder = NULL, Rfuzzydid selects the order by deterministic five-fold cross-validation.

Inference

By default, fuzzydid() computes bootstrap standard errors and confidence intervals. Use breps to set the number of replications and seed for reproducibility. Use cluster for one-way clustered bootstrap inference.

nose = TRUE skips bootstrap inference and returns point estimates only. This is useful while checking data construction, but final empirical results should usually report uncertainty.

The returned object is a causal estimand summary, not a predictive model. Accordingly, it supports estimate extractors such as coef(), confint(), vcov(), nobs(), formula(), generics::tidy(), and generics::glance(), but it does not define observation-level fitted values, residuals, or predictions.

Runtime is approximately linear in the number of bootstrap replications. Within each replication, the no-covariate estimators are dominated by group-period subsetting and empirical distribution calculations. Covariate-adjusted estimators additionally fit nuisance OLS/logit/probit models, and sieve estimation grows with the number of generated basis terms.

Translating from Stata

The core Stata command

fuzzydid y g t d, did tc cic breps(50)

maps to:

fuzzydid(
  data = df,
  formula = y ~ d,
  group = "g",
  time = "t",
  did = TRUE,
  tc = TRUE,
  cic = TRUE,
  breps = 50
)

For multiple periods, the Stata command uses a forward group variable. In Rfuzzydid, pass the backward group through group and the forward group through group_forward:

fuzzydid(
  data = panel_df,
  formula = y ~ d,
  group = "G_t",
  group_forward = "G_tplus1",
  time = "t",
  did = TRUE,
  tc = TRUE
)

See vignette("stata-parity", package = "Rfuzzydid") for direct R/Stata parity examples, and vignette("paper-replication", package = "Rfuzzydid") for an INPRES replication exercise.

Practical checklist

Before treating an estimate as a causal effect, check the design:

References

de Chaisemartin, C. and D’Haultfoeuille, X. (2018). Fuzzy Differences-in-Differences. Review of Economic Studies, 85(2): 999-1028. doi:10.1093/restud/rdx049.

de Chaisemartin, C., D’Haultfoeuille, X., and Guyonvarch, Y. (2019). Fuzzy Differences-in-Differences with Stata. Stata Journal, 19(2): 435-458. doi:10.1177/1536867X19854019.