Math notation for EDA, IRM, and refutation diagnostics (Causalis)
This note summarizes the core notation and formulas used by Causalis’s EDA helpers, the IRM estimator, and refutation diagnostics. The estimator and scores follow the Double/Debiased Machine Learning (DoubleML) formulation; we implement them on top of CausalData. Throughout, denotes the sample mean.
1. Variables and parameters
-
Observed data: .
- (or ) is the outcome.
- is a binary treatment.
- are observed confounders.
-
Potential outcomes: .
-
Targets:
- ATE: .
- ATT (a.k.a. ATET/ATTE): .
Assumptions (standard): Uncofoundedness ; positivity a.s.; SUTVA; and regularity for cross-fitting and ML.
2. Nuisance functions (IRM)
- Propensity: .
- Outcome regressions: .
- Cross-fitted predictions are denoted (length ).
- Clipping: with user trimming threshold .
Binary outcomes. If is binary and the outcome learner is a classifier with predict_proba, Causalis uses the class-1 probability as . For numerical stability you may also clip into with a tiny (e.g., ).
3. Scores, EIFs, and estimators
Let , and
(Optionally, Causalis can normalize to have sample mean 1; this is a Hájek-style, second-order variance tweak. If you normalize in estimation, also use the normalized influence values in the variance formula.)
3.1 ATE (AIPW/DR)
Score decomposition as :
Estimator and influence function:
Efficient influence function (truth ):
Compact identity (useful in code). With and ,
3.2 ATT (a.k.a. ATET/ATTE)
Let (estimated by the sample mean ). Define the control reweighting factor . Then
Estimator and influence function:
Because , this choice centers at zero in-sample. If you use fold-specific , either center per fold or re-express everything with the global .
> Equivalent residual-weight form. With : > > > \bar w(u_1 h_1-u_0 h_0)=\frac{D}{p_1}(Y-\hat g_1)-\frac{1-D}{p_1}\frac{\hat m}{1-\hat m}(Y-\hat g_0). >
Efficient influence function (truth ):
Weight-sum identity (diagnostic).
Equivalently, without the factor:
So in-sample diagnostics can be phrased as either:
- (raw factors), or
- (ATT weights match treated weights ).
3.3 Orthogonality (Neyman)
For ,
for rich directions . Causalis provides OOS moment checks and numerical derivative diagnostics to assess this.
Useful partial derivatives (ATE):
whose conditional expectations given vanish at truth.
Useful partial derivatives (ATT):
and each has zero conditional mean at truth.
4. Estimation (cross-fitting)
- Split into folds (stratified by ).
- On each train fold, fit learners for ; predict on the held-out fold to build cross-fitted .
- Compute as above. Let be the estimated influence function values (computed OOS per fold).
Variance and CI (single parameter):
Hájek normalization (optional, ATE). Replace with and similarly . This preserves asymptotics (orthogonality) and can reduce finite-sample variance; it slightly alters the finite-sample IF, so use the normalized in variance calculations. For ATT, it’s common to normalize control weights so their sum matches the count of treated (the diagnostic above already ensures this in expectation).
5. Positivity (overlap) & trimming
- EDA reports the distribution of and the share near 0 or 1.
- Clipping stabilizes the IPW terms. Under positivity (true bounded away from 0 and 1) it is asymptotically innocuous for ATE/ATT but may introduce small finite-sample bias; hard trimming (dropping units by a propensity threshold) changes the target population and should be interpreted accordingly.
6. Refutation diagnostics
6.1 OOS moment checks
Compute on each test fold using fold-specific nuisances. Verify the empirical mean of (and conditional moments against simple basis functions of ) is close to zero.
APIs: refute_irm_orthogonality, oos_moment_check, influence_summary.
6.2 Orthogonality derivatives
Numerically evaluate derivatives of the moment condition w.r.t. perturbations in at . Small magnitudes support Neyman orthogonality in practice.
APIs: orthogonality_derivatives (ATE) and ATT variants. (Derivatives summarized in §3.3.)
6.3 Sensitivity (heuristic bias bounds)
Following the DoubleML structure, Causalis exposes a Cauchy–Schwarz–style worst-case bias bound aligned with the score:
- Outcome noise scale (pooled):
- Score-weight norm (ATE):
(Equivalently since pointwise.)
- Score-weight norm (ATT):
For user-chosen sensitivity multipliers and correlation cap ,
Optional refinement (tighter but still heuristic). Split outcome variance by arm:
and use
in place of for ATE.
APIs: IRM.sensitivity_analysis, refutation/sensitivity.py. Bounds remain heuristic (not identified).
7. Implementation guardrails
- Abort/warn if (ATT undefined / division by zero).
- Enforce small clipping on and (for classifiers) on to prevent exploding residual-weight products.
- Stratify folds by when cross-fitting.
- If using fold-specific denominators (e.g., ), ensure fold-wise centering of or re-express with global .