Compare Implementation of DML IRM in Causalis and DML IRM in DoubleML

This notebook presents the doubleml benchmark research workflow and key analysis steps.

Comparing IRM model from Causalis with dml.DoubleMLIRM from DoubleML with default CatboostRegressor and CatboostClassifier for g0, g1 amd m

DGP

We will use DGP: generate_obs_hte_26_rich() read more at this notebook

Result

	user_id	y	d	tenure_months	avg_sessions_week	spend_last_month	age_years	income_monthly	prior_purchases_12m	support_tickets_90d	premium_user	mobile_user	urban_resident	referred_user	m	m_obs	tau_link	g0	g1	cate
0	1	0.000000	0.0	28.814654	1.0	77.936767	50.234101	1926.698301	1.0	2.0	1.0	1.0	1.0	0.0	0.045453	0.045453	0.089095	8.137981	9.142395	1.004414
1	2	80.099611	1.0	25.913345	3.0	53.777740	28.115859	5104.271509	3.0	0.0	1.0	1.0	0.0	1.0	0.041514	0.041514	0.246679	60.459257	78.817307	18.358049
2	3	6.400482	1.0	24.969929	10.0	134.764322	22.907062	5267.938255	8.0	3.0	0.0	1.0	1.0	0.0	0.052593	0.052593	0.162968	7.712855	9.138577	1.425723
3	4	2.788238	0.0	40.655089	5.0	59.517074	31.970490	6597.327018	3.0	2.0	1.0	1.0	1.0	0.0	0.036221	0.036221	0.188755	25.386510	31.159932	5.773422
4	5	0.000000	0.0	18.560899	3.0	74.370930	39.237248	4930.009628	5.0	1.0	1.0	1.0	0.0	0.0	0.036343	0.036343	0.174757	15.359250	18.600227	3.240977

Result

Ground truth ATTE is 10.914991 Treated share is 4.9490% (4949 / 100000)

Result

CausalData(df=(100000, 13), treatment='d', outcome='y', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'age_years', 'income_monthly', 'prior_purchases_12m', 'support_tickets_90d', 'premium_user', 'mobile_user', 'urban_resident', 'referred_user'])

Comparison of Inference

Causalis

Result

Causalis matched run (seed=123): ATTE=12.331200 | time=93.0s | peak memory=53.8 MB

	value
field
estimand	ATTE
model	IRM
value	12.3312 (ci_abs: 7.8782, 16.7842)
value_relative	26.7054 (ci_rel: 16.9239, 36.4870)
alpha	0.0500
p_value	0.0000
is_significant	True
n_treated	4949
n_control	95051
treatment_mean	58.5062
control_mean	76.0871
time	2026-04-08

DoubleML

Result

DoubleML matched run (seed=123): ATTE=12.331221 | time=49.9s | peak memory=40.3 MB

	coef	std err	t	P>\|t\|	2.5 %	97.5 %
d	12.331221	2.271978	5.427526	5.714047e-08	7.878225	16.784216

Result

benchmark_seed 123 stability_seeds [1, 2, 3, 4, 5] n_folds 3 trimming_threshold 0.01 catboost_params {'iterations': 500, 'depth': 6, 'learning_rate... Name: matched_benchmark_config, dtype: object

Result

	seed	oracle ATTE	value (ATTE)	abs error vs oracle	time (s)	peak memory (MB)
library
Causalis	123	10.915	12.3312	1.4162	93.0	53.8
DoubleML	123	10.915	12.3312	1.4162	49.9	40.3

Result

	seed	causalis_atte	doubleml_atte	gap	causalis_abs_error	doubleml_abs_error	causalis_time_s	doubleml_time_s	causalis_peak_memory_mb	doubleml_peak_memory_mb
0	1	12.0184	12.018400	0.000000	1.103409	1.103408	41.884077	38.518782	52.933817	40.233231
1	2	10.6533	10.653295	0.000005	0.261691	0.261696	41.097836	34.594209	52.924817	40.230717
2	3	12.5838	12.583832	-0.000032	1.668809	1.668841	36.624745	35.670716	52.924624	40.230673
3	4	12.5729	12.572861	0.000039	1.657909	1.657870	35.710764	34.821244	52.924680	40.230070
4	5	11.5216	11.521633	-0.000033	0.606609	0.606641	37.362521	34.608914	52.924766	40.229713

Conclusion

Result

	mean ATTE	std ATTE	mean abs error	std abs error	mean time (s)	std time (s)	mean peak memory (MB)	std peak memory (MB)
library
Causalis	11.87	0.8105	1.0597	0.6271	38.5360	2.7742	52.9265	0.0041
DoubleML	11.87	0.8105	1.0597	0.6271	35.6428	1.6670	40.2309	0.0014

Result

oracle_atte 10.914991 treated_share 0.04949 matched_seed_gap 0.000021 max_gap_over_stability_seeds 0.000039 near_identical_for_matched_seed True identical_over_stability_seeds False Name: agreement_check, dtype: object

With matched learners, fold count, trimming, and seeded sample splitting, Causalis and DoubleML agree up to numerical precision. The remaining movement is seed-to-seed benchmark variability, not a difference in the ATTE estimand.