| Type: | Package |
| Title: | Intersectional Differential Item Functioning Analysis |
| Version: | 1.0.1 |
| Description: | A toolkit for detecting Differential Item Functioning (DIF) using Logistic Regression (LR) as described in Swaminathan and Rogers (1990) <doi:10.1111/j.1745-3984.1990.tb00754.x>, the IRT Likelihood Ratio Test (LRT) following Thissen, Steinberg & Wainer (1993, ISBN:0-8058-0972-4), and model-based recursive partitioning (MOB) as implemented in 'strucchange' following Strobl, Kopf and Zeileis (2015) <doi:10.1007/s11336-013-9388-3>. Designed for both standard two-group and intersectional multi-group designs, 'iDIFr' prioritises effect size reporting alongside statistical significance, clear guidance on group construction, and interpretable output suitable for applied testing contexts. Built-in Intersectional Contrast Analysis (ICA) classifies items as amplified, pure-intersection, obscured, or none by comparing single-variable and intersectional analyses. |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| Depends: | R (≥ 4.1.0) |
| Imports: | Rcpp (≥ 1.0.0), generics, parallel, stats, cli, dplyr, ggplot2, rlang, strucchange |
| LinkingTo: | Rcpp |
| Suggests: | testthat (≥ 3.0.0), knitr, rmarkdown, openxlsx |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| URL: | https://github.com/thmsrgrs/iDIFr |
| BugReports: | https://github.com/thmsrgrs/iDIFr/issues |
| Config/roxygen2/version: | 8.0.0 |
| NeedsCompilation: | yes |
| Packaged: | 2026-06-02 10:59:57 UTC; TMRog |
| Author: | Thomas Rogers [aut, cre] |
| Maintainer: | Thomas Rogers <thomas.rogers@britishcouncil.org> |
| Repository: | CRAN |
| Date/Publication: | 2026-06-08 17:50:06 UTC |
iDIFr: Intersectional Differential Item Functioning Analysis in R
Description
A user-friendly toolkit for detecting Differential Item Functioning (DIF)
using Logistic Regression (LR), the IRT Likelihood Ratio Test (LRT), and
model-based recursive partitioning (MOB). Designed for both standard
two-group and intersectional multi-group designs, with built-in
Intersectional Contrast Analysis (ICA) via the ica = TRUE argument.
Key functions
-
idifr(): Main entry point – run DIF analysis (setica = TRUEfor ICA) -
check_groups(): Explore group structure and cell sizes -
merge_groups(): Combine sparse intersectional cells -
tidy.idifr(): Extract results as a flat data frame
Quick start
library(iDIFr)
# Check your group structure first
check_groups(my_data, group = ~ gender * nationality * age_band)
# Run DIF analysis
result <- idifr(
data = my_data,
items = 1:20,
group = ~ gender * nationality * age_band,
method = c("LR", "LRT")
)
print(result) # Flagged items with effect sizes
summary(result) # Full breakdown by method
plot(result) # Effect size heatmap
tidy(result) # Flat data frame for further analysis
Author(s)
Maintainer: Thomas Rogers thomas.rogers@britishcouncil.org
Authors:
Thomas Rogers thomas.rogers@britishcouncil.org
See Also
Useful links:
Check group structure and cell sizes before running DIF analysis
Description
Provides a concise summary of the group structure defined by your demographic
variables. Reports how many groups meet the recommended minimum cell size,
optionally checks which levels of specified variables are fully crossed, and
points to group_details() and cross_details() for full breakdowns.
Usage
check_groups(data, group, min_cell_size = 50, cross_by = NULL, plot = TRUE)
Arguments
data |
A data frame containing demographic variables. |
group |
A one-sided formula specifying the grouping variable(s),
using the same syntax as |
min_cell_size |
Minimum recommended group size. Default is 50. |
cross_by |
Optional character vector of variable name(s) to check for
complete crossing. For each unique value of the named variable(s), the
function checks whether every intersectional cell containing that value
meets |
plot |
Logical. If |
Value
An object of class idifr_groups (invisibly), which can be passed
to merge_groups(), group_details(), or cross_details().
See Also
group_details(), cross_details(), merge_groups(), idifr()
Examples
dat <- simulate_dif(300, 10,
demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1)
grp <- check_groups(dat, group = ~ group * nationality,
cross_by = "nationality")
Full crossing breakdown for a demographic variable
Description
For each unique level of the specified variable, shows whether every intersectional cell containing that level meets the minimum cell size. One row per level, showing how many cells are adequate and the smallest cell size observed.
Usage
cross_details(grp, cross_by, min_cell_size = NULL)
Arguments
grp |
An |
cross_by |
Character vector of variable name(s) to check. Must match variables in the group formula. |
min_cell_size |
Minimum recommended group size. Overrides the stored value if supplied. |
Value
The idifr_groups object, invisibly.
See Also
check_groups(), group_details()
Examples
dat <- simulate_dif(300, 10,
demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1)
grp <- check_groups(dat, group = ~ group * nationality,
cross_by = "nationality", plot = FALSE)
cross_details(grp, cross_by = "nationality")
Export iDIFr results to Excel
Description
Writes an idifr result object to a formatted .xlsx workbook. Each
requested sheet is written as an Excel table so that column headers are
bold, filters are enabled, and values are properly typed.
Only columns that are actually present in the result object are written; columns listed in the per-method definitions that were not produced by the current run are silently omitted.
Usage
export_results(x, file, sheets = NULL, overwrite = TRUE)
Arguments
x |
An |
file |
Path to the output |
sheets |
Character vector of sheet keys to include. Valid keys:
|
overwrite |
Logical. If |
Value
x invisibly (so the call can be piped).
Examples
if (requireNamespace("openxlsx", quietly = TRUE)) {
dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1)
result <- idifr(dat, 1:10, ~ group, method = "LR", verbose = FALSE)
export_results(result, tempfile(fileext = ".xlsx"))
}
Fit a 2PL IRT model via marginal maximum likelihood (EM)
Description
Fit a 2PL IRT model via marginal maximum likelihood (EM)
Usage
fit_2pl(
resp,
group = NULL,
constrain = "items",
n_nodes = 15,
max_iter = 200,
tol = 1e-04,
start = NULL,
verbose = FALSE
)
Arguments
resp |
Integer matrix (0/1/NA). Rows=persons, cols=items. |
group |
Character/factor vector of group membership (length=nrow(resp)). NULL for single-group calibration. |
constrain |
Parameter constraint across groups:
|
n_nodes |
Number of quadrature nodes. Default 15. Values of 11-21 are appropriate for DIF detection; use 21 for publication- quality parameter estimates. |
max_iter |
Maximum EM iterations. Default 200. |
tol |
Convergence tolerance on log-likelihood change. Default 1e-4. |
start |
Optional list with elements |
verbose |
Print iteration log. Default FALSE. |
Value
Object of class irt_2pl.
Full per-group cell size breakdown
Description
Prints a detailed table showing the cell size for every intersectional
group, flagging those below the recommended minimum. This is the full
breakdown that check_groups() summarises in a single line.
Usage
group_details(grp, min_cell_size = NULL)
Arguments
grp |
An |
min_cell_size |
Minimum recommended group size. Overrides the stored value if supplied. |
Value
The idifr_groups object, invisibly.
See Also
check_groups(), cross_details()
Examples
dat <- simulate_dif(300, 10,
demo_vars = list(nationality = c("UK", "DE", "FR")), seed = 1)
grp <- check_groups(dat, group = ~ group * nationality, plot = FALSE)
group_details(grp)
Run intersectional DIF analysis
Description
The main entry point for iDIFr. Detects Differential Item Functioning (DIF)
using one or more statistical methods, with full support for intersectional
group structures defined by crossing multiple demographic variables.
Effect sizes are reported alongside significance for all methods. Groups with
small cell sizes trigger a warning. Use exclude_below_min and
fully_crossed to control whether those groups are included in the analysis.
Usage
idifr(
data,
items,
group,
method,
ica = FALSE,
min_cell_size = 50,
exclude_below_min = FALSE,
fully_crossed = NULL,
value_selection = NULL,
anchor = NULL,
alpha = 0.05,
p_adjust = "BH",
nonuniform_es = "MAPPD",
verbose = TRUE
)
Arguments
data |
A data frame containing item responses and demographic variables. |
items |
A numeric vector of column indices, or a character vector of column names, identifying the item response columns. Items must be dichotomously scored (0/1). |
group |
A one-sided formula specifying the grouping variable(s).
Use |
method |
A character vector specifying which DIF method(s) to use.
Must be one or more of |
ica |
Logical. If |
min_cell_size |
Minimum acceptable group size. Groups below this
threshold trigger a warning. Also used as the crossing criterion when
|
exclude_below_min |
Logical. If |
fully_crossed |
A character vector of variable name(s). Only levels of
the named variable(s) that are fully crossed – meaning every intersectional
cell for that level meets |
value_selection |
A named list for filtering specific values of
demographic variables before analysis. Each element should be named after
a grouping variable and contain a character vector of values to keep.
Variables not mentioned are left unchanged (all values included). Default
is |
anchor |
A numeric or character vector identifying anchor items
(items assumed to be DIF-free) for IRT scaling. If |
alpha |
Significance level for DIF flagging. Default is |
p_adjust |
Method for p-value adjustment across items. Passed to
|
nonuniform_es |
Character. The effect size metric to use for
non-uniform DIF detection when |
verbose |
Logical. If |
Value
An object of class idifr containing:
- results
A data frame with one row per item per method, including test statistics, p-values, adjusted p-values, effect sizes, and DIF classification (negligible/moderate/large for all methods).
- groups
An
idifr_groupsobject describing the group structure, cell sizes, and any small-cell warnings.- method
Character vector of methods used.
- call
The matched call.
- items
Character vector of item names analysed.
- alpha
The significance level used.
- p_adjust
The p-value adjustment method used.
- excluded_groups
Character vector of group labels excluded by
exclude_below_minorfully_crossed, orNULLif no exclusions.- excluded_values
Named list of value_selection filters applied, or
NULLif none.- ica
Data frame of ICA classifications (one row per item per method) when
ica = TRUEand the design is intersectional, otherwiseNULL. Columns:item,method,ica_class,marginal_vars,intersectional_flag.
See Also
check_groups() for exploring group structure before analysis;
group_details() and cross_details() for full breakdowns;
merge_groups() for combining sparse cells.
Examples
# Basic two-group analysis using synthetic data
dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1)
result <- idifr(dat, 1:10, ~ group, method = "LR")
print(result)
# Intersectional analysis with ICA
dat_ix <- simulate_dif(500, 10,
demo_vars = list(nationality = c("UK", "DE", "FR")),
seed = 2)
result_ix <- idifr(dat_ix, 1:10, ~ group * nationality,
method = "LR", ica = TRUE)
Compute per-item log-likelihood contributions from a fitted irt_2pl model
Description
Uses local independence to decompose total LL into item contributions:
LL_j = sum_i log P(x_ij | posterior_i)
where P(x_ij | posterior_i) = sum_k posterior[i,k] * P(x_ij | theta_k)
Usage
item_loglik(model, resp = NULL, post = NULL, gi = 1)
Arguments
model |
An |
resp |
Response matrix (0/1/NA). Defaults to model$resp. |
post |
Posterior matrix (persons x nodes). Defaults to model$posterior. |
gi |
Group index (integer). Used to select group-specific item params and ability nodes. Default 1. |
Value
Numeric vector of length n_items.
Per-item LL for a multigroup constrained model
Description
For the constrained model, each person uses shared item params but their own group-specific ability nodes.
Usage
item_loglik_mg(model, resp = NULL, post = NULL)
Arguments
model |
An |
resp |
Response matrix. Defaults to model$resp. |
post |
Posterior matrix. Defaults to model$posterior. |
Value
Numeric vector of length n_items.
Merge sparse groups
Description
Combines sparse intersectional cells by collapsing levels of one or more
demographic variables. Returns a modified data frame ready to pass back
to idifr() or check_groups().
Usage
merge_groups(groups, grp_formula = NULL, ..., min_cell_size = 50)
Arguments
groups |
An |
grp_formula |
A formula, required only if |
... |
Named arguments specifying merge rules. Each should be named after a demographic variable, with a named list mapping new level names to vectors of old level names. |
min_cell_size |
Minimum cell size to validate against after merging. |
Value
The original data frame with recoded grouping variable(s).
Examples
dat <- simulate_dif(300, 10,
demo_vars = list(nationality = c("UK", "DE", "FR", "ES")), seed = 1)
grp <- check_groups(dat, group = ~ group * nationality, plot = FALSE)
merged <- merge_groups(grp,
nationality = list("Other" = c("DE", "FR", "ES")))
Plot method for idifr objects
Description
Plot method for idifr objects
Usage
## S3 method for class 'idifr'
plot(x, type = "items", ...)
Arguments
x |
An |
type |
Plot type: |
... |
Ignored. |
Value
No return value, called for side effects.
Print method for idifr objects
Description
Print method for idifr objects
Usage
## S3 method for class 'idifr'
print(x, ...)
Arguments
x |
An |
... |
Ignored. |
Value
No return value, called for side effects.
Generate synthetic DIF data for testing and simulation
Description
Generates synthetic dichotomous item response data with a known DIF
structure. Supports three DIF patterns: standard group DIF ("standard"),
DIF confined to a single intersectional cell ("intersection"), and a
mixture of both ("mixed").
Usage
simulate_dif(
n_persons = 500,
n_items = 20,
n_groups = 2,
dif_items = c(3, 7),
dif_effect = 0.8,
dif_type = "uniform",
dif_structure = "standard",
dif_group = NULL,
demo_vars = NULL,
seed = NULL
)
Arguments
n_persons |
Integer. Total number of respondents. |
n_items |
Integer. Number of items. Default 20. |
n_groups |
Integer. Number of groups. Default 2. |
dif_items |
Which items have DIF. For |
dif_effect |
Numeric. DIF shift size in logits. Default |
dif_type |
|
dif_structure |
One of |
dif_group |
Named list identifying the target intersectional cell for
intersection DIF. Variable names must match |
demo_vars |
Named list of additional demographic variables to add, with
their levels. Persons are assigned randomly with uniform probability.
Example: |
seed |
Integer random seed for reproducibility. |
Value
A data frame with item response columns (item_1, item_2, ...),
a group column, and any additional columns specified in demo_vars.
True item parameters and DIF metadata are stored as attributes.
Examples
# Standard DIF
dat <- simulate_dif(500, 20, 2, c(3, 7), 1.0)
# Intersection-only DIF
dat_ix <- simulate_dif(
n_persons = 500,
n_items = 20,
dif_items = c(5, 12),
dif_effect = 1.5,
dif_structure = "intersection",
dif_group = list(group = "G1", nationality = "UK", age_band = "Young"),
demo_vars = list(nationality = c("UK", "DE", "FR"),
age_band = c("Young", "Old")),
seed = 42
)
# Mixed DIF
dat_mix <- simulate_dif(
n_persons = 500,
n_items = 20,
dif_items = list(standard = c(3, 7), intersection = c(12, 15)),
dif_effect = 1.0,
dif_structure = "mixed",
dif_group = list(group = "G1", nationality = "UK", age_band = "Young"),
demo_vars = list(nationality = c("UK", "DE", "FR"),
age_band = c("Young", "Old")),
seed = 42
)
Summary method for idifr objects
Description
Summary method for idifr objects
Usage
## S3 method for class 'idifr'
summary(object, ...)
Arguments
object |
An |
... |
Ignored. |
Value
No return value, called for side effects.
Tidy an idifr object
Description
Re-exports generics::tidy so that
tidy() is available after library(iDIFr) without loading
broom or generics separately. For the iDIFr-specific
method see tidy.idifr.
Usage
tidy(x, ...)
Arguments
x |
An object to tidy. When |
... |
Additional arguments passed to the method. |
Value
A data frame (exact structure depends on the method dispatched).
Return tidy data frame of DIF results
Description
Returns results as a tidy data frame suitable for use with dplyr,
ggplot2, or for export. Use the table argument to choose which
table to return.
Implements the tidy generic from the generics package so that
tidy() works correctly regardless of whether broom is also loaded.
Usage
## S3 method for class 'idifr'
tidy(x, table = NULL, ...)
Arguments
x |
An |
table |
Which table to return.
|
... |
Ignored. |
Value
A data frame.
Examples
dat <- simulate_dif(300, 10, dif_items = c(3, 7), seed = 1)
result <- idifr(dat, 1:10, ~ group, method = "LR")
# Item-level results (default)
tidy(result)
tidy(result, table = "results")
# Group direction table for flagged items
tidy(result, table = "direction")
# ICA classification table (requires ica = TRUE)
dat_ix <- simulate_dif(500, 10,
demo_vars = list(nationality = c("UK", "DE")), seed = 2)
result_ix <- idifr(dat_ix, 1:10, ~ group * nationality,
method = "LR", ica = TRUE)
tidy(result_ix, table = "ica")