DML GATE Example

DML GATE Example

This notebook demonstrates how to estimate Group Average Treatment Effects (GATE) using Double Machine Learning. GATE allows us to understand treatment effect heterogeneity by estimating the average effect within specific groups defined by covariates.

Mathematical Formulation

Let YY be the outcome, DD the binary treatment, and XX the covariates. We define the Conditional Average Treatment Effect (CATE) as:

\tau(X) = \mathbb{E}[Y(1) - Y(0) \mid X]

For a set of groups G1,,GKG_1, \dots, G_K (where Gk(X)G_k(X) is an indicator that unit falls into group kk), the GATE for group kk is:

\theta_k = \mathbb{E}[\tau(X) \mid G_k(X)=1]

We will:

  1. Generate synthetic data with heterogeneous treatment effects.
  2. Perform Exploratory Data Analysis (EDA).
  3. Estimate GATEs using gate_esimand with both automatic quantile groups and custom user-defined groups.

Generate data

We generate observational data with a nonlinear outcome model, nonlinear treatment assignment, and a heterogeneous (nonlinear) treatment effect τ(X)\tau(X).

Result

Ground-truth ATT from the DGP: 1.386

ydtenure_monthsavg_sessions_weekspend_last_monthpremium_userurban_resident
02.2373160.027.6566055.35255472.5525681.00.0
15.7714690.011.5201916.798247188.4812871.00.0
26.3746531.033.0054142.05545951.0404400.01.0
32.3641771.035.2867774.429404166.9922390.01.0
48.3780790.00.5875786.658307179.3711260.00.0

EDA

Result

png

png

Inference: Estimating GATE

We use gate_esimand to estimate Group Average Treatment Effects. This function works by leveraging the orthogonal signal from the DoubleML framework.

Methodology

  1. Orthogonal Signal: We fit a DoubleML IRM model to obtain a score \psi(W;\hat{\eta}) such that:

    \mathbb{E}[\psi(W; \hat{\eta}) \mid X] \approx \tau(X)

    where W=(Y,D,X)W = (Y, D, X) and \hat{\eta} are the estimated nuisance parameters.

  2. Best Linear Predictor (BLP): We assume a linear model for the CATE using group indicators G(X)G(X):

    τ(X)kθkGk(X)\tau(X) \approx \sum_k \theta_k G_k(X)

    We estimate coefficients θ\theta by solving the BLP optimization:

    \hat{\theta} = \arg\min_{\theta} \sum_{i=1}^N (\psi(W_i) - \theta^\top G(X_i))^2

When G(X)G(X) consists of mutually exclusive group indicators, \hat{\theta}_k corresponds to the average of the orthogonal signal within group kk, providing a consistent estimate of the GATE.

1. GATE by CATE Quantiles

If no groups are provided, gate_esimand automatically creates groups based on quantiles of the estimated CATE (Conditional Average Treatment Effect).

Result

GATE Results (Quantiles):

groupnthetastd_errorp_valueci_lowerci_upper
0Group_020001.3768510.2126209.441200e-110.9601231.793579
1Group_120000.8991520.1960524.512072e-060.5148961.283408
2Group_220001.1149110.1940889.226342e-090.7345071.495316
3Group_320001.1906210.1887062.801072e-100.8207651.560478
4Group_420001.3326980.2071031.235086e-100.9267841.738612

2. GATE by User-Defined Groups

We can also define custom groups based on covariates. For example, let's group users by their tenure:

  • < 1 year
  • 1-3 years
  • > 3 years
Result

GATE Results (Tenure Groups):

groupnthetastd_errorp_valueci_lowerci_upper
0tenure_months_1-3y67851.1873700.1074532.189553e-280.9767651.397974
1tenure_months_<1y16491.2315910.2215862.727795e-080.7972901.665891
2tenure_months_>3y15661.1045410.2214726.123645e-070.6704641.538617