Scenario5 min read

Uncofoundedness

Automated conversion of uncofoundedness.ipynb

Uncofoundedness

We call 'Uncofoundedness' a scenario where a treatment is not randomly assigned to participants, so confounders effect on treatment assignment and outcome. We have client - level data. Confounders were measured before treatment and outcome after

Data

Let's look at the example:

In our ecosystem we have a product, which effect on LTV we want to estimate

Treatment - first purchase in product.

Outcome - LTV after first purchase.

We will test hypothesis:

HoH_o - There is no difference in LTV between treatment and control groups.

HaH_a - There is a difference in LTV between treatment and control groups.

We will use DGP from Causalis. Read more at https://causalis.causalcraft.com/articles/generate_obs_hte_26_rich

Result
ydtenure_monthsavg_sessions_weekspend_last_monthage_yearsincome_monthlyprior_purchases_12msupport_tickets_90dpremium_usermobile_userurban_residentreferred_usermm_obstau_linkg0g1cate
00.0000000.028.8146541.077.93676750.2341011926.6983011.02.01.01.01.00.00.0454530.0454530.0890958.1379819.1423951.004414
180.0996111.025.9133453.053.77774028.1158595104.2715093.00.01.01.00.01.00.0415140.0415140.24667960.45925778.81730718.358049
26.4004821.024.96992910.0134.76432222.9070625267.9382558.03.00.01.01.00.00.0525930.0525930.1629687.7128559.1385771.425723
32.7882380.040.6550895.059.51707431.9704906597.3270183.02.01.01.01.00.00.0362210.0362210.18875525.38651031.1599325.773422
40.0000000.018.5608993.074.37093039.2372484930.0096285.01.01.01.00.00.00.0363430.0363430.17475715.35925018.6002273.240977
Result

Ground truth ATE is 19.409586529660793 Ground truth ATTE is 10.914991423363865

Result

CausalData(df=(100000, 13), treatment='d', outcome='y', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'age_years', 'income_monthly', 'prior_purchases_12m', 'support_tickets_90d', 'premium_user', 'mobile_user', 'urban_resident', 'referred_user'])

Result
treatmentcountmeanstdminp10p25medianp75p90max
00.09505176.087138240.8007130.00.00.08.3954464.859278190.22790021396.007575
11.0494958.506172199.4856250.00.00.00.0000036.958280148.8371935143.642132

Our data has strong treatment class disbalance. Only 5% of sample activated in treatment.

Treatment group has lower mean LTV. It's too early to draw conclusions.

Result

png

We see large right tale

Result

png

Result
treatmentnoutlier_countoutlier_ratelower_boundupper_boundhas_outliersmethodtail
00.095051113000.118884-97.288916162.148194Trueiqrboth
11.049497210.145686-55.43742092.395699Trueiqrboth

We see many outliers. It's common situation for LTV metric. Dropping them will lead to a biased conclusion

Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0premium_user0.7518070.5918370.159970-0.3457210.00000
1income_monthly4549.3851903918.058798631.326392-0.2776110.00000
2spend_last_month89.09180167.37538921.716412-0.2683600.00000
3support_tickets_90d0.9845451.2592440.2746990.2539740.00000
4avg_sessions_week5.0477534.2301480.817606-0.2017350.00000
5prior_purchases_12m3.9042203.5136390.390581-0.1893720.00000
6tenure_months28.74010025.5591613.180939-0.1841560.00000
7age_years36.43598434.8090831.626901-0.1441420.00000
8referred_user0.2714860.3071330.0356470.0786710.00001
9urban_resident0.6007930.5688020.031991-0.0649540.00013
10mobile_user0.8745730.8700750.004498-0.0134770.99998

As we see clients are differ on this confounders. We need to controll them to make causal inference

Inference

ATTE is right estimand here. We will estimate effect on clients that were treated, had first purchase in our product

Math Explanation of the IRM Model and ATTE Estimand

The Interactive Regression Model (IRM) is a flexible framework used in Double Machine Learning (DML) to estimate treatment effects. Unlike linear models, it allows the treatment effect to vary with confounders XX (interaction) and makes no parametric assumptions about the functional forms of the outcomes.

We write W=(Y,D,X)W=(Y,D,X) for an observation, where D{0,1}D\in\{0,1\} is treatment and YY is the observed outcome.

1. Nuisance Functions

The IRM framework relies on three "nuisance" components estimated from the data:

  • Outcome Regression (Control): g0(X)=E[YX,D=0]g_0(X) = \mathbb{E}[Y | X, D=0]
  • Outcome Regression (Treated): g1(X)=E[YX,D=1]g_1(X) = \mathbb{E}[Y | X, D=1]
  • Propensity Score: m(X)=P(D=1X)m(X) = \mathbb{P}(D=1 | X)

Let p=P(D=1)=E[D]p = \mathbb{P}(D=1) = \mathbb{E}[D] denote the overall treatment rate (estimated by the sample mean of DD).

In the provided implementation (irm.py), these are estimated using cross-fitting (splitting data into folds) to avoid overfitting bias.

2. ATTE (Average Treatment Effect on the Treated)

The Average Treatment Effect on the Treated (ATTE) measures the impact of the treatment specifically on those individuals who received it: θATTE=E[Y(1)Y(0)D=1]\theta_{ATTE} = \mathbb{E}[Y(1) - Y(0) \mid D=1]

Under unconfoundedness, (Y(1),Y(0))DX(Y(1),Y(0)) \perp D \mid X, and overlap 0<m(X)<10 < m(X) < 1, this is identified from observed data.

3. The Orthogonal Score

DML uses a Neyman-orthogonal score ψ\psi to ensure the estimator is robust to small errors in the nuisance function estimates. The score for ATTE is defined as: ψ(W;θ,η)=ψb(W;η)+ψa(W;η)θ\psi(W; \theta, \eta) = \psi_b(W; \eta) + \psi_a(W; \eta)\theta

To match the implementation in irm.py, define:

  • Residuals: u0=Yg0(X)u_0 = Y - g_0(X), u1=Yg1(X)u_1 = Y - g_1(X)
  • IPW terms: h1=Dm(X)h_1 = \frac{D}{m(X)}, h0=1D1m(X)h_0 = \frac{1-D}{1-m(X)}
  • Weights (ATTE): w=Dpw = \frac{D}{p} and wˉ=m(X)p\bar{w} = \frac{m(X)}{p} (the normalized form with E[w]=1\mathbb{E}[w]=1)

Then:

ψa(W;η)=w=Dpψb(W;η)=w(g1(X)g0(X))+wˉ(u1h1u0h0)\begin{aligned} \psi_a(W;\eta) &= -w = -\frac{D}{p} \\ \psi_b(W;\eta) &= w\,(g_1(X)-g_0(X)) + \bar{w}\,(u_1 h_1 - u_0 h_0) \end{aligned}

(If normalize_ipw=True, the code rescales h1h_1 and h0h_0 to have mean 1.)

4. Final Estimation (Step-by-step simplification)

For brevity, write m=m(X)m = m(X), g0=g0(X)g_0 = g_0(X), and g1=g1(X)g_1 = g_1(X). Plug in w,wˉ,h1,h0w, \bar{w}, h_1, h_0:

ψb=Dp(g1g0)mp[Dm(Yg1)1D1m(Yg0)] =Dp(g1g0)+Dp(Yg1)mp1D1m(Yg0) =Dp(Yg0)mp1D1m(Yg0).\begin{aligned} \psi_b &= \frac{D}{p}(g_1-g_0)• \frac{m}{p}\left[\frac{D}{m}(Y-g_1) - \frac{1-D}{1-m}(Y-g_0)\right] \ &= \frac{D}{p}(g_1-g_0) + \frac{D}{p}(Y-g_1) - \frac{m}{p}\frac{1-D}{1-m}(Y-g_0) \ &= \frac{D}{p}(Y-g_0) - \frac{m}{p}\frac{1-D}{1-m}(Y-g_0). \end{aligned}

So the g1(X)g_1(X) terms cancel, and the ATTE score depends only on g0(X)g_0(X) and m(X)m(X). The estimator solves E[ψ(W;θ,η)]=0\mathbb{E}[\psi(W;\theta,\eta)]=0:

θ^ATTE=E[ψb]E[ψa]=E[ψb]E[D/p]=E[ψb]. \begin{aligned} \hat{\theta}_{ATTE} &= \frac{\mathbb{E}[\psi_b]}{\mathbb{E}[-\psi_a]} = \frac{\mathbb{E}[\psi_b]}{\mathbb{E}[D/p]} = \mathbb{E}[\psi_b]. \end{aligned}

Equivalently, θ^ATTE=E[Dp(Yg0(X))m(X)p1D1m(X)(Yg0(X))].\hat{\theta}_{ATTE} = \mathbb{E}\left[\frac{D}{p}(Y-g_0(X)) - \frac{m(X)}{p}\frac{1-D}{1-m(X)}(Y-g_0(X))\right].

Result
value
field
estimandATTE
modelIRM
value12.6688 (ci_abs: 8.3105, 17.0271)
value_relative27.6386 (ci_rel: 18.1021, 37.1750)
alpha0.0500
p_value0.0000
is_significantTrue
n_treated4949
n_control95051
treatment_mean58.5062
control_mean76.0871
time2026-02-21

Our estimate is 12.1542 dollars (ci_abs: 7.7933, 16.5152). Mean in treatment group is 58.5062 dollars, so without our product it would be 46.3520 dollars.

Refutation

Unconfoundedness

Result
metricvalueflag
0balance_max_smd0.006429GREEN
1balance_frac_violations0.000000GREEN

balance_max_smd is 0.011635 so dml specification dealt with controlling confounders

Sensitivity

Result
r2_yr2_drhotheta_longtheta_shortdelta
d0.0002740.0432941.012.66879210.7838361.884956
Result

{'theta': 12.668792204333924, 'se': 2.2236447652673634, 'alpha': 0.05, 'z': 1.959963984540054, 'H0': 0.0, 'sampling_ci': (8.31052854999887, 17.02705585866898), 'theta_bounds_cofounding': (10.969460982391881, 14.368123426275966), 'bias_aware_ci': (6.611958468034454, 18.73239677430682), 'max_bias_base': 842.3917878895062, 'max_bias': 1.6993312219420422, 'bound_width': 1.6993312219420422, 'sigma2': 33809.712692133624, 'nu2': 20.988759377088236, 'rv': 0.014816251034451345, 'rva': 0.00976902071312256, 'params': {'r2_y': 7.9e-05, 'r2_d': 0.048984, 'rho': 1.0, 'use_signed_rr': False}}

Even if we have latent confounder as strong as 'tenure_months' our estimate will be > 0 with bias_aware_ci': (4.646164852659804, 18.5265291001404)

SUTVA

Result

1.) Are your clients independent (i). Outcome of ones do not depend on others? 2.) Are all clients have full window to measure metrics? 3.) Do you measure confounders before treatment and outcome after? 4.) Do you have a consistent label of treatment, such as if a person does not receive a treatment, he has a label 0?

SUTVA is untestable from data alone, so we call it true by design

Score

Result
metricvalueflag
0se_plugin2.223645NA
1psi_p99_over_med67.284574RED
2psi_kurtosis2559.617403RED
3max_|t|_g10.000000GREEN
4max_|t|_g00.611087GREEN
5max_|t|_m1.480379GREEN
6oos_tstat_fold0.000013GREEN
7oos_tstat_strict0.000013GREEN
Result

png

Result

png

Result

png

DML is specified correctly. There are many outliers in data that effect the score

Overlap

Result

png

Customers are not inclined to activate the our product

Result
metricvalueflag
0edge_0.01_below0.017870GREEN
1edge_0.01_above0.000000GREEN
2edge_0.02_below0.112270RED
3edge_0.02_above0.000000RED
4KS0.194823GREEN
5AUC0.626738GREEN
6ESS_treated_ratio0.466078GREEN
7ESS_control_ratio0.997893GREEN
8tails_w1_q99/med5.087948YELLOW
9tails_w0_q99/med1.185047GREEN
10ATT_identity_relerr0.010190GREEN
11clip_m_total0.017870GREEN
12calib_ECE0.005566GREEN
13calib_slope0.677594YELLOW
14calib_intercept-0.890933RED
Result

png

Conclusion

First purchase in our product is increasing LTV 11.7542 (ci_abs: 7.1434, 16.3651) dollars. Model is specified correctly and there is no evidence that assumptions are false