MultiTreatmentIRM
0) Assumptions
-
SUTVA / consistency
- No interference across units.
- No hidden treatment versions.
- Observed outcome matches the potential outcome under realized arm.
-
Multi-arm unconfoundedness
where is one-hot with treatment columns (column 0 is baseline/control).
- Positivity / overlap
and in implementation this is stabilized by multiclass trimming.
- Fold-level support for cross-fitting Each training fold must contain all treatment arms; otherwise nuisance models for missing arms are not identifiable.
1) Data and estimand
For i.i.d. units , observe:
- outcome: ,
- confounders: ,
- one-hot treatment vector: with .
Target is a vector of baseline contrasts:
2) Nuisance functions
For each arm :
Generalized propensity (multiclass):
Cross-fitted predictions are denoted and .
3) Cross-fitting
Split sample into folds (n_folds=F). For each fold :
- Train multiclass propensity model on and predict on .
- For each arm , train outcome model only on rows in with , then predict on .
Binary-outcome edge case used in implementation:
If within a fold+arm the training outcome is single-class (all 0 or all 1), use a deterministic constant predictor for that arm/fold instead of fitting a classifier.
4) Trimming in multi-class setup
Implementation uses lower-bound trimming + row renormalization:
So each row stays a valid probability simplex after trimming.
Constraint: .
5) Orthogonal score for multi-arm ATE
Define residuals for each arm:
Define IPW representers:
If normalize_ipw=True, apply column-wise Hájek normalization:
For each contrast :
Moment system:
Thus with :
6) Influence function and inference
Per-contrast influence function:
Variance/SE:
Wald interval:
P-values are computed per contrast via normal approximation; significance flag uses Bonferroni:
7) Relative effect
Baseline orthogonal signal:
Relative effect (%):
Implementation uses a delta-style plug-in variance:
then
8) Math pseudocode
References
Foundations: propensity score + unconfoundedness
- Rosenbaum & Rubin (1983) — introduces the propensity score as a balancing score under ignorability (your whole “(D \perp (Y(d)) \mid X)” setup). (OUP Academic)
- Hahn (1998) — semiparametric efficiency / influence-function view of ATE estimators using propensity scores and outcome regression (conceptual basis for IF-based SEs). (JSTOR)
Doubly-robust / AIPW (your score psi_b)
- Robins, Rotnitzky & Zhao (1994) — classic IPW estimating equations paper; widely cited as the origin of AIPW-style augmentation logic. (Taylor & Francis Online)
- Bang & Robins (2005) — formal “doubly robust” theory: consistency if either outcome model or propensity model is correct (your estimator’s key robustness property). (PubMed)
- Glynn & Quinn (2010) — practitioner-friendly AIPW exposition (good to cite in docs as an accessible reference). (UC Berkeley Law)
Multi-valued / multi-treatment propensity score & effects (your (K\ge2) one-hot setting)
- Imbens (2000) — extends propensity score ideas to multi-valued treatments / dose-response; canonical reference for generalized propensity score logic. (JSTOR)
- Lechner (1999/IZA DP 91) — identification and estimation under CIA with multiple mutually exclusive treatments (balancing scores beyond binary). (Econstor)
- Lopez & Gutman (2017, Stat Sci) — review + methods for categorical multiple treatments (matching/weighting/regression variants; good for positioning your approach). (arXiv)
Cross-fitting + Neyman-orthogonal / DoubleML framing (your “DoubleML-style cross-fitting” claim)
- Chernozhukov et al. (2018, Econometrics Journal) — the modern reference for Neyman-orthogonal scores + cross-fitting delivering (\sqrt{n}) inference with ML nuisances. (OUP Academic)
- Chernozhukov et al. (2016, arXiv:1608.00060) — earlier/longer technical version focused on treatment/causal parameters (often cited in implementations). (arXiv)
- (Optional efficiency detail) Hirano, Imbens & Ridder (2003) — shows efficiency gains/conditions when using an estimated propensity score; useful background for the weighting component. (Wiley Online Library)
Sensitivity analysis (your cf_y, r2_d, rho style)
- Cinelli & Hazlett (2020, JRSSB) — modern sensitivity analysis framed via omitted-variable bias with partial (R^2) style parameters (matches your
r2_d/ confounding-strength parameterization). (carloscinelli.com) - Cinelli, Ferwerda & Hazlett (sensemakr paper) — practical companion describing the implemented sensitivity summaries/statistics. (carloscinelli.com)
- (Alternative tradition) Oster (2019, J Business & Econ Stats) — coefficient-stability approach (different parameterization, but often cited alongside Cinelli–Hazlett in “how sensitive is this?” docs). (IDEAS/RePEc)