Skip to content
Research6 min read

Uncofoundedness

Automated conversion of uncofoundedness_1.ipynb

Uncofoundedness

We call 'Uncofoundedness' a scenario where a treatment is not randomly assigned to participants, so confounders effect on treatment assignment and outcome.

Treatment - purchase in one category.

We will test hypothesis:

HoH_o - There is no difference in LTV between treatment and control groups.

HaH_a - There is a difference in LTV between treatment and control groups.

Result
ydtenure_monthsavg_sessions_weekspend_last_monthage_yearsincome_monthlyprior_purchases_12msupport_tickets_90dpremium_usermobile_userurban_residentreferred_usermm_obstau_linkg0g1cate
00.0000000.028.8146541.077.93676750.2341011926.6983011.02.01.01.01.00.00.0479700.0479701.3307648.13798135.17708627.039105
1559.3641581.025.9133453.053.77774028.1158595104.2715093.00.01.01.00.01.00.0496950.0496952.19020960.459257584.580685524.121427
226.1430031.024.96992910.0134.76432222.9070625267.9382558.03.00.01.01.00.00.0770870.0770871.5701777.71285538.29799230.585137
319.2835851.040.6550895.059.51707431.9704906597.3270183.02.01.01.01.00.00.0694810.0694811.93384425.386510189.737828164.351318
40.0000001.018.5608993.074.37093039.2372484930.0096285.01.01.01.00.00.00.0470970.0470971.81826515.359250102.43359787.074347
Result

Ground truth ATE is 617.0712367740982 Ground truth ATTE is 837.4043605736649

Result

CausalData(df=(100000, 13), treatment='d', outcome='y', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'age_years', 'income_monthly', 'prior_purchases_12m', 'support_tickets_90d', 'premium_user', 'mobile_user', 'urban_resident', 'referred_user'])

Result
treatmentcountmeanstdminp10p25medianp75p90max
00.09501573.966492238.5037070.00.00.07.44848262.190137184.70187321396.007575
11.04985907.4717262545.0779960.00.00.0143.706288730.6517592269.45899848466.747037
Result

png

png

(<Figure size 1540x880 with 1 Axes>, <Figure size 1540x880 with 1 Axes>)

Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0spend_last_month86.467107117.55936531.0922590.3297350.00000
1avg_sessions_week4.9452936.1889671.2436740.2878780.00000
2premium_user0.7381990.8523570.1141580.2857820.00000
3prior_purchases_12m3.8635484.2916750.4281270.2008090.00000
4income_monthly4496.9640034921.775317424.8113140.1695960.00000
5age_years36.44610134.6280061.818095-0.1625910.00000
6referred_user0.2705990.3237710.0531720.1165360.00000
7mobile_user0.8725460.9087260.0361800.1161150.00001
8urban_resident0.5971690.6381140.0409450.0843270.00000
9support_tickets_90d0.9941381.0744230.0802860.0779650.00002
10tenure_months28.52308429.7185021.1954180.0645880.00000

Inference

Math Explanation of the IRM Model and ATTE Estimand

The Interactive Regression Model (IRM) is a flexible framework used in Double Machine Learning (DML) to estimate treatment effects. Unlike linear models, it allows the treatment effect to vary with confounders XX (interaction) and makes no parametric assumptions about the functional forms of the outcomes.

We write W=(Y,D,X)W=(Y,D,X) for an observation, where D{0,1}D\in\{0,1\} is treatment and YY is the observed outcome.

1. Nuisance Functions

The IRM framework relies on three "nuisance" components estimated from the data:

  • Outcome Regression (Control): g0(X)=E[YX,D=0]g_0(X) = \mathbb{E}[Y | X, D=0]
  • Outcome Regression (Treated): g1(X)=E[YX,D=1]g_1(X) = \mathbb{E}[Y | X, D=1]
  • Propensity Score: m(X)=P(D=1X)m(X) = \mathbb{P}(D=1 | X)

Let p=P(D=1)=E[D]p = \mathbb{P}(D=1) = \mathbb{E}[D] denote the overall treatment rate (estimated by the sample mean of DD).

In the provided implementation (irm.py), these are estimated using cross-fitting (splitting data into folds) to avoid overfitting bias.

2. ATTE (Average Treatment Effect on the Treated)

The Average Treatment Effect on the Treated (ATTE) measures the impact of the treatment specifically on those individuals who received it: θATTE=E[Y(1)Y(0)D=1]\theta_{ATTE} = \mathbb{E}[Y(1) - Y(0) \mid D=1]

Under unconfoundedness, (Y(1),Y(0))DX(Y(1),Y(0)) \perp D \mid X, and overlap 0<m(X)<10 < m(X) < 1, this is identified from observed data.

3. The Orthogonal Score

DML uses a Neyman-orthogonal score ψ\psi to ensure the estimator is robust to small errors in the nuisance function estimates. The score for ATTE is defined as: ψ(W;θ,η)=ψb(W;η)+ψa(W;η)θ\psi(W; \theta, \eta) = \psi_b(W; \eta) + \psi_a(W; \eta)\theta

To match the implementation in irm.py, define:

  • Residuals: u0=Yg0(X)u_0 = Y - g_0(X), u1=Yg1(X)u_1 = Y - g_1(X)
  • IPW terms: h1=Dm(X)h_1 = \frac{D}{m(X)}, h0=1D1m(X)h_0 = \frac{1-D}{1-m(X)}
  • Weights (ATTE): w=Dpw = \frac{D}{p} and wˉ=m(X)p\bar{w} = \frac{m(X)}{p} (the normalized form with E[w]=1\mathbb{E}[w]=1)

Then: \begin{aligned} \psi_a(W;\eta) &= -w = -\frac{D}{p} \ \psi_b(W;\eta) &= w,(g_1(X)-g_0(X)) + \bar{w},(u_1 h_1 - u_0 h_0) \end{aligned}

(If normalize_ipw=True, the code rescales h1h_1 and h0h_0 to have mean 1.)

4. Final Estimation (Step-by-step simplification)

For brevity, write m=m(X)m = m(X), g0=g0(X)g_0 = g_0(X), and g1=g1(X)g_1 = g_1(X). Plug in w,wˉ,h1,h0w, \bar{w}, h_1, h_0:

\begin{aligned} \psi_b &= \frac{D}{p}(g_1-g_0)

  • \frac{m}{p}\left[\frac{D}{m}(Y-g_1) - \frac{1-D}{1-m}(Y-g_0)\right] \ &= \frac{D}{p}(g_1-g_0) + \frac{D}{p}(Y-g_1) - \frac{m}{p}\frac{1-D}{1-m}(Y-g_0) \ &= \frac{D}{p}(Y-g_0) - \frac{m}{p}\frac{1-D}{1-m}(Y-g_0). \end{aligned}

So the g1(X)g_1(X) terms cancel, and the ATTE score depends only on g0(X)g_0(X) and m(X)m(X).

The estimator solves E[ψ(W;θ,η)]=0\mathbb{E}[\psi(W;\theta,\eta)]=0: \begin{aligned} \hat{\theta}_{ATTE} &= \frac{\mathbb{E}[\psi_b]}{\mathbb{E}[-\psi_a]} = \frac{\mathbb{E}[\psi_b]}{\mathbb{E}[D/p]} = \mathbb{E}[\psi_b]. \end{aligned}

Equivalently, θ^ATTE=E[Dp(Yg0(X))m(X)p1D1m(X)(Yg0(X))].\hat{\theta}_{ATTE} = \mathbb{E}\left[\frac{D}{p}(Y-g_0(X)) - \frac{m(X)}{p}\frac{1-D}{1-m(X)}(Y-g_0(X))\right].

Result
estimandcoefficientp_vallower_ciupper_cirelative_diff_%is_significant
0ATTE816.9667410.0749.154822884.77866893.17824True
Result

CausalEstimate(estimand='ATTE', model='IRM', model_options={'n_folds': 5, 'n_rep': 1, 'normalize_ipw': False, 'trimming_rule': 'truncate', 'trimming_threshold': 0.01, 'random_state': None, 'std_error': 34.59855341330873, 't_stat': 23.61274273910593}, value=816.9667408936743, ci_upper_absolute=884.7786595009447, ci_lower_absolute=749.1548222864039, value_relative=893.1782401420442, ci_upper_relative=967.3160563964027, ci_lower_relative=819.0404238876857, alpha=0.05, p_value=0.0, is_significant=True, n_treated=4985, n_control=95015, outcome='y', treatment='d', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'age_years', 'income_monthly', 'prior_purchases_12m', 'support_tickets_90d', 'premium_user', 'mobile_user', 'urban_resident', 'referred_user'], time=datetime.datetime(2026, 1, 27, 8, 26, 46, 441822), diagnostic_data=UnconfoundednessDiagnosticData(m_hat=array([0.05620082, 0.06597108, 0.12947216, ..., 0.03993944, 0.06856774, 0.0686461 ], shape=(100000,)), d=array([0, 1, 1, ..., 0, 0, 0], shape=(100000,)), y=array([ 0. , 559.3641575 , 26.14300299, ..., 86.88646582, 169.67753671, 0. ], shape=(100000,)), x=array([[ 28.81465403, 1. , 77.9367668 , ..., 1. ,

  1. , 0. ], [ 25.91334462, 3. , 53.7777399 , ..., 1. ,
  2. , 1. ], [ 24.9699287 , 10. , 134.76432201, ..., 1. ,
  3. , 0. ], ..., [ 18.95058854, 2. , 49.18443354, ..., 1. ,
  4. , 0. ], [ 22.87615781, 6. , 46.8461344 , ..., 1. ,
  5. , 1. ], [ 38.81380133, 4. , 149.87138917, ..., 1. ,
  6. , 1. ]], shape=(100000, 11)), g0_hat=array([ -1.88039973, 70.12097778, -2.32580324, ..., 85.09028141, 219.30257652, 284.15408662], shape=(100000,)), g1_hat=array([ 3.52125899, 626.31094544, 255.36485828, ..., 556.25218828, 1510.46557774, 2935.52765031], shape=(100000,)), psi_b=array([-2.24619835e+00, 9.81430651e+03, 5.71089393e+02, ..., -1.49895669e+00, 7.32831695e+01, 4.20136010e+02], shape=(100000,)), folds=array([0, 1, 4, ..., 3, 2, 3], shape=(100000,)), trimming_threshold=0.01, sigma2=230611.39428973786, nu2=20.51450972522466, psi_sigma2=array([-230607.8583866 , -226129.52187451, -178068.73534978, ..., -230608.1680113 , -228148.74971339, -149867.84934463], shape=(100000,)), psi_nu2=array([ 25.98365281, -366.08026856, -303.22555781, ..., 12.27041473, 36.55198755, 36.61933817], shape=(100000,)), riesz_rep=array([-1.19453237, 20.06018054, 20.06018054, ..., -0.83452271, -1.47673775, -1.47854995], shape=(100000,)), m_alpha=array([23.96253506, 28.42254226, 59.84989764, ..., 16.74067631, 29.62362582, 29.65997892], shape=(100000,)), psi=array([-2.24619835e+00, -6.57419380e+03, -1.58174109e+04, ..., -1.49895669e+00, 7.32831695e+01, 4.20136010e+02], shape=(100000,)), score='ATTE'), sensitivity_analysis={})
Result
metricvalueflag
0edge_0.01_below0.000000GREEN
1edge_0.01_above0.000000GREEN
2edge_0.02_below0.183620RED
3edge_0.02_above0.000000RED
4KS0.193386GREEN
5AUC0.623803GREEN
6ESS_treated_ratio0.639887GREEN
7ESS_control_ratio0.998503GREEN
8tails_w1_q99/med5.610889GREEN
9tails_w0_q99/med1.307781GREEN
10ATT_identity_relerr0.012851GREEN
11clip_m_total0.049540YELLOW
12calib_ECE0.006803GREEN
13calib_slope0.626749YELLOW
14calib_intercept-1.036712RED
Result

png

Result
metricvalueflag
0se_plugin34.598553NA
1psi_p99_over_med828.340166RED
2psi_kurtosis1769.053579RED
3max_|t|_g10.000000GREEN
4max_|t|_g00.872006GREEN
5max_|t|_m0.417469GREEN
Result

1.) Are your clients independent (i)? 2.) Do you measure confounders, treatment, and outcome in the same intervals? 3.) Do you measure confounders before treatment and outcome after? 4.) Do you have a consistent label of treatment, such as if a person does not receive a treatment, he has a label 0?

Result
metricvalueflag
0balance_max_smd0.018554GREEN
1balance_frac_violations0.000000GREEN
Result

{'theta': 816.9667408936743, 'se': 34.59855341330873, 'alpha': 0.05, 'z': 1.959963984540054, 'H0': 0.0, 'sampling_ci': (749.1548222864039, 884.7786595009447), 'theta_bounds_cofounding': (794.9964525215009, 838.9370292658477), 'bias_aware_ci': (728.8783636080927, 908.4863956227268), 'max_bias': 21.97028837217341, 'sigma2': 230611.39428973786, 'nu2': 20.51450972522466, 'rv': 0.2730480733887724, 'rva': 0.2561902047512034, 'params': {'r2_y': 0.01, 'r2_d': 0.01, 'rho': 1.0, 'use_signed_rr': False}}

Result
r2_yr2_drhotheta_longtheta_shortdelta
d0.0002610.000496-1.0816.966741816.1764880.790253