CUPEDModel
0) Assumptions
- SUTVA / consistency
-
No interference: unit ’s outcome does not depend on other units’ treatment assignments.
-
No hidden versions: “treatment” and “control” correspond to well-defined interventions.
-
Consistency: the observed outcome equals the potential outcome under the realized treatment: if , then ; if , then .
-
Random assignment or Unconfoundedness (ATE/ITT identification): In an RCT, treatment is independent of potential outcomes (and baseline covariates):
This is the key condition under which the estimand is identified as an average causal effect.
- Overlap / positivity: Both arms occur with nonzero probability:
(and within any randomization strata if stratified), ensuring the ATE/ITT is estimable from the observed design.
-
Regression is a working model (robust inference): The Lin specification is used as an adjustment; it need not be exactly correct. HC2 standard errors target valid large-sample inference under heteroskedasticity and potential misspecification of the conditional mean.
-
Finite second moments: Outcomes and regressors have finite second moments (so variances exist), which supports HC-type variance estimation and the delta-method approximation used later.
-
Design matrix regularity: The constructed design matrix
has full column rank (no perfect multicollinearity), so exists. In practice this also means you avoid near-duplicate covariates and handle zero-variance / constant columns.
-
Leverage not degenerate (HC2 well-defined): For HC2 you require for all (true when has full rank and no observation is perfectly fit), so the HC2 weights are finite.
-
Relative-effect CI (delta method, “nocov”): For the relative CI you additionally assume a first-order Taylor approximation is adequate, is not near zero, and you ignore (your deliberate “nocov” rule).
1) Data + target estimand
You observe i.i.d. units with:
- outcome (post-period),
- treatment ,
- pre-treatment covariates (chosen subset).
Target (ATE/ITT):
2) Global centering of covariates (full-sample centering)
Let be the number of covariates you actually keep (after any variance / quality filtering). For each kept covariate , center over the entire sample:
In matrix form, with :
Key property (by construction):
Why this matters for interpretation (with interactions):
With global centering, the regression intercepts line up at the sample mean covariate level. In particular, the “main” treatment coefficient becomes the difference in intercepts at , which is exactly what you want when you interpret as the ATE/ITT under Lin’s fully-interacted adjustment.
If you don’t center and estimate:
then the interaction term vanishes at , so
which is often not a meaningful reference point (age , revenue , etc.). The regression can still be statistically valid, but no longer directly matches the “average-over-the-covariate-distribution” effect unless the covariates happen to have mean zero.
3) Build the Lin (2013) fully-interacted design matrix
Let
Define the elementwise (row-wise) interaction matrix by
Then the Lin (2013) fully-interacted design matrix is:
Partition the parameter vector as:
Row-wise model:
4) Why the coefficient on is the ATE (with global centering)
The regression implies two group-specific conditional mean functions:
- Control :
- Treated :
So the conditional treatment effect as a function of covariates is:
Now average this conditional effect over the covariate distribution used by the estimator:
With global centering, (in sample: ), hence:
So, under the globally-centered Lin specification:
- is the average treatment effect over the (centered) covariate distribution (ATE/ITT),
- captures effect heterogeneity with respect to .
5) OLS fit (point estimate)
Fit OLS on :
The reported ATE/ITT estimate is the coefficient on the treatment column:
Residuals:
6) Robust covariance / standard error for (HC2 only)
Let the OLS “bread” be:
Define the hat matrix and leverages:
HC2 weights:
Define the “meat”:
Robust covariance:
Extract the treatment-component variance and standard error:
A generic test statistic is:
Absolute CI (using some critical value ):
If your implementation later “recovers” the effective from the computed CI bounds, one way to express it is:
7) Relative CI delta method
Define the relative effect (percent) using the control mean:
Let
Gradient:
You already have from HC2:
Estimate the variance of the control mean with the usual sample-mean formula:
Delta method, ignoring covariance (“nocov” rule):
So:
Use the same critical value as for the absolute CI:
8) Math pseudocode (only math)
References
1) Data + target estimand (potential outcomes, ATE/ITT)
-
Neyman (1923; English translation 1990) — foundational “Neyman model” for randomized experiments; potential outcomes framing and unbiased difference-in-means for average effects. (mimuw.edu.pl)
-
Rubin (1974) — formal potential outcomes / causal effects language for randomized and nonrandomized studies; ATE as a target estimand. (Journal of Educational Psychology; DOI: 10.1037/h0037350.) (Demographic Research)
-
Holland (1986) — classic “Statistics and Causal Inference”; clarifies potential outcomes notation and estimands like ATE. (JSTOR)
-
Imbens (2004) — explicit discussion of average treatment effects as estimands (broader than RCTs, but standard for ATE notation/targets). (MIT Press Direct)
2) Global centering of covariates (full-sample centering) + why it matters with interactions
-
Lin (2013) — uses the fully-interacted regression adjustment (Lin estimator) and discusses centering (often stated “without loss of generality, center covariates”) to interpret the main treatment coefficient as an average effect. (Project Euclid)
-
Brambor, Clark & Golder (2006) — clear explanation of interaction models: main effects are evaluated at the moderator equal to 0, and mean-centering shifts the reference point to the mean (interpretation changes, fitted values don’t). Great citation for your “ is the effect at if uncentered” statement. (Cambridge University Press & Assessment)
-
Lei & Ding (2021) — explicitly notes centering covariates (wlog) in the Lin regression-adjustment setup and studies its properties. (NSF Public Access Repository)
3) Build the Lin (2013) fully-interacted design matrix (Z=[1, D, X^c, D\odot X^c])
-
Lin (2013) — the canonical reference for the “fully interacted” OLS adjustment in experiments (treatment, covariates, and treatment×covariate interactions). (Project Euclid)
-
Lei & Ding (2021) — modern formalization/extension of Lin’s regression adjustment; repeats the same design structure in a randomized-experiment framework. (IDEAS/RePEc)
4) Why the coefficient on (D) is the ATE/ITT (with global centering + interactions)
-
Lin (2013) — main theoretical justification: with treatment×covariate interactions, the OLS coefficient on treatment targets the average treatment effect (and can’t hurt asymptotic precision under the Neyman model). (Project Euclid)
-
Freedman (2008) — motivates why naive regression adjustment can misbehave and why one should be careful; this is exactly the critique Lin reexamines (useful for your “why we do Lin spec” story). (JSTOR)
-
Lei & Ding (2021) — provides additional asymptotic results/guarantees for the Lin adjustment under regimes with many covariates (supporting your “(\hat\tau) is ATE/ITT under this spec” claim). (IDEAS/RePEc)
5) OLS fit (point estimate) for (\hat\tau) under this regression-adjustment estimator
You typically don’t need a separate “OLS paper” citation if you already cite the experimental regression-adjustment papers that define the estimator as OLS on that (Z).
-
Lin (2013) — defines the estimator via OLS on ([1,D,X,D\cdot X]). (Project Euclid)
-
Freedman (2008) — discusses regression adjustment in experiments (OLS adjustment) and what it is / isn’t justified by randomization. (Department of Statistics)
6) Robust covariance / standard error for (\hat\tau) (HC2 only)
-
White (1980) — original heteroskedasticity-consistent “sandwich” covariance for OLS (foundation for HC estimators). (JSTOR)
-
MacKinnon & White (1985) — introduces the HC family including HC2 leverage adjustment (\hat e_i^2/(1-h_{ii})); this is your key HC2 citation. (J. Econometrics; DOI: 10.1016/0304-4076(85)90158-7.) (ScienceDirect)
-
Zeileis (2004) — widely cited computational/econometrics reference that summarizes HC estimators and their implementations (nice for “HC2 definition in practice”). (jstatsoft.org)
-
Long & Ervin (2000) — practical discussion of using heteroskedasticity-consistent SEs and finite-sample considerations (optional, but commonly cited). (JSTOR)
7) Relative CI via delta method (your “nocov” delta option)
-
Oehlert (1992) — short classic note reviewing the delta method and when it works well (perfect for citing your Taylor/gradient variance approximation). (Taylor & Francis Online)
-
Deng et al. (2018) — applied “metric analytics / online experiments” reference that explicitly uses the delta method for percent change / relative lift style estimands (very aligned with your (\hat\tau_{rel}=100\hat\tau/\hat\mu_c)). (Alex Deng)