GATE / GATET (from IRM)
This note documents the strict subgroup estimators implemented on top of a fitted IRM. Sections 0-8 describe GATE. Section 9 adds the implemented GATET variant, which targets treatment effects among treated units inside each pre-treatment subgroup.
0) Assumptions
- SUTVA / consistency: observed outcomes correspond to the realized treatment, with no interference across units.
- Unconfoundedness: conditional on observed covariates ,
- Overlap / positivity: for relevant covariate values and inside each estimable group,
- Group membership must be pre-treatment. Groups defined by treatment itself, post-treatment outcomes, or post-treatment covariates are not valid causal subgroups.
- Cross-fitted nuisance models are assumed accurate enough that the orthogonal score behaves like a stable pseudo-outcome for subgroup averaging.
1) Data, notation, and estimand
For each observation , observe
where
- is the outcome,
- is treatment,
- are observed confounders,
- is a pre-specified subgroup label.
The target is the Group Average Treatment Effect
Write the subgroup basis as
and stack it into a matrix
Because this implementation enforces a strict partition,
2) Start from the fitted IRM nuisances
The GATE estimator does not refit a separate causal model from scratch. Instead, it reuses the fitted IRM nuisance functions:
These predictions are cross-fitted, meaning each observation receives nuisance predictions from models trained on other folds.
Define
The implementation then builds the canonical doubly robust orthogonal signal
Under the IRM assumptions, this signal satisfies
So GATE can be estimated by averaging the orthogonal score within each subgroup.
Important implementation note
Even if the IRM was fit with normalize_ipw=True, GATE intentionally ignores that option and uses the canonical unnormalized Horvitz-Thompson-style score above.
3) Convert user-supplied groups into a strict dummy basis
Let the user pass either:
- a single subgroup label column, or
- a full dummy basis.
Case A: one-column subgroup labels
If groups has one column, convert labels into subgroup dummies:
Case B: multi-column subgroup indicators
If groups already has columns, interpret it as a candidate basis matrix
Then verify:
This excludes overlapping subgroup definitions. Overlapping bases correspond to a more general BLP-style projection, not the strict GATE estimator implemented here.
Alignment step
Before any estimation, align rows of groups to the fit-time observation ids used by the IRM.
4) GATE point estimation
Once the orthogonal signal and subgroup basis are available, the estimator solves a saturated no-intercept linear projection:
Because the basis is a disjoint partition, the design is block-diagonal:
where
Therefore the estimator reduces to the groupwise sample mean:
Equivalently, if
then
Pseudocode
5) Closed-form HCx inference
For each group , define residuals
Let the within-group residual sum of squares be
Because the design is a partition, the covariance matrix is diagonal, so each group variance can be computed in closed form.
HC0
HC1
Let be the number of groups. Then
provided . If , the implementation falls back to HC0 scaling.
HC2
Since leverage within a no-intercept subgroup cell is
HC2 becomes
HC3
Similarly,
If , these variance formulas are not estimable, so standard errors and interval-based inference are returned as NaN.
Wald inference
For each subgroup ,
The code uses normal-reference inference:
and
6) Output object
The function returns a GateEstimate object containing one row per group.
Main fields:
value: subgroup treatment effect estimate .std_error: robust HCx standard error.wald_stat/test_stat: subgroup-vs-zero Wald statistic.p_value: two-sided p-value for
ci_lower,ci_upper: confidence interval.is_significant: indicator thatp_value < alpha.
Support and overlap diagnostics:
n_group: total observations in the subgroup.n_treated,n_control: within-group treatment counts.share_treated: empirical treatment share in the group.mean_phi,std_phi: mean and spread of the orthogonal signal within the group.mean_propensity,min_propensity,max_propensity: average and range of within the subgroup.
The object also stores:
covariance: diagonal covariance matrix across group estimates.summary_table: table combining estimates and diagnostics.diagnostic_data: optional payload with the full orthogonal signal, aligned basis, and group warnings.
7) Interpretation of GATE results
7.1 What does value mean?
For subgroup ,
So:
- if
value > 0, treatment is estimated to increase the outcome in that subgroup; - if
value < 0, treatment is estimated to decrease the outcome in that subgroup; - if
value \approx 0, the estimated subgroup effect is small relative to the outcome scale.
This is a within-group causal effect, not just a descriptive difference in observed means.
7.2 What does the confidence interval tell you?
The interval
describes uncertainty around the subgroup effect estimate.
Practical reading:
- If the interval excludes , the data provide evidence that the treatment effect in that subgroup is nonzero at level .
- If the interval is wide, the subgroup effect is estimated imprecisely, usually because the group is small, noisy, or has weak overlap.
- If the interval includes both substantively positive and negative values, the sign of the subgroup effect is not well resolved.
7.3 What p_value does and does not mean
The reported p_value tests
It does not test whether two groups differ from one another.
For example, suppose:
- group A has a significant positive effect,
- group B has a non-significant effect.
That alone does not imply
To test subgroup heterogeneity directly, use formal contrasts such as:
7.4 How to interpret support diagnostics
The subgroup effect is only as credible as the support behind it.
Read these columns together:
n_group: very small groups are unstable.n_treated,n_control: both must be positive; otherwise the subgroup is not identified here.share_treated: very extreme treatment shares suggest weak within-group overlap.min_propensity,max_propensity: values close to or indicate practical positivity problems.
In this implementation:
- a group is small (
n_group < 10), - a group has extreme estimated propensity support,
- a group with no treated or no control observations is rejected before estimation.
7.5 How to interpret differences across groups
If two groups have different value estimates, that is evidence of possible treatment-effect heterogeneity, but it should be interpreted carefully.
Useful rule:
valueanswers: "What is the treatment effect inside this subgroup?"contrast(...)answers: "Is the treatment effect different between subgroup A and subgroup B?"
So subgroup heterogeneity claims should be based on contrasts, not on visual comparison of point estimates alone.
7.6 What can go wrong in interpretation
Common mistakes:
- Treating post-treatment group definitions as causal subgroups.
- Interpreting subgroup-vs-zero significance as proof of group-vs-group difference.
- Ignoring extreme propensity scores inside a subgroup.
- Over-interpreting very small groups with unstable standard errors.
- Reading GATE as if it were a conditional effect for every value of ; here it is an average effect over a coarse partition.
8) Compact end-to-end pseudocode
9) GATET (group average treatment effect on the treated)
GATET uses the same subgroup alignment and strict partition logic as GATE, but it changes the target estimand, the orthogonal score, and the support requirements.
9.1 Target estimand
For subgroup ,
So GATET answers: "among the treated units inside subgroup , what is the average causal effect?"
When the groups form an exhaustive partition, ATTE is the treated-share mixture of subgroup GATETs:
Empirically, this is the weighted average of subgroup estimates using within-sample treated shares:
9.2 Orthogonal signal used by the implementation
Unlike ordinary GATE, GATET does not reuse the ATE-style score
Instead it builds the canonical ATT-style subgroup signal
Only and enter this score. As with GATE,
normalize_ipw=True on the fitted IRM is intentionally ignored so the
estimator uses the canonical unnormalized orthogonal signal above.
This matters because, in general,
So GATET is not "GATE restricted to treated rows." It is a different orthogonal score matched to the subgroup ATT estimand.
9.3 Point estimation
Let
Then the implementation estimates subgroup ATT as
Equivalently, it solves the groupwise moment condition
9.4 Closed-form HCx inference
Define the subgroup residual
Let
Write for total subgroup size and for the number of groups. Then the closed-form robust variances implemented in code are:
HC0
HC1
with fallback to HC0 scaling if .
HC2
HC3
The same normal-reference Wald inference is then used:
with confidence interval
9.5 Support rules
GATET uses the same strict partition requirement as GATE: every observation must belong to exactly one subgroup.
But the support checks are different:
- Every GATET group must contain at least one treated observation.
- A group with zero treated units is rejected.
- A group with zero control units is still accepted, but the code emits a warning because within-group overlap is degenerate and identification relies on and learned outside that subgroup.
- If a group has only one total observation (
n_group = 1), standard errors and interval-based inference are returned asNaN.
9.6 Output and diagnostics
The return type is still GateEstimate, but with
estimand="GATET". The same helper surface is available:
Interpretation of the main columns:
value: estimated treatment effect among treated units in the subgroup.n_treated: treated support actually identifying that subgroup ATT.n_control: controls available in the subgroup; may be zero for GATET.share_treated: empirical treated share in the subgroup.
When diagnostics are stored, the payload includes:
orthogonal_signal: the transformed subgroup signal used for diagnostics, whose within-group mean equals the reported GATET estimate.raw_treated_signal: the raw ATT-style score before subgroup scaling.
So the practical distinction is:
- GATE: average treatment effect within subgroup .
- GATET: average treatment effect among the treated units within subgroup .