Compare Implementation of DML IRM in Causalis and DML IRM in DoubleML
This notebook presents the doubleml benchmark research workflow and key analysis steps.
Comparing IRM model from Causalis with dml.DoubleMLIRM from DoubleML with default CatboostRegressor and CatboostClassifier for g0, g1 amd m
DGP
We will use DGP: generate_obs_hte_26_rich()
read more at this notebook
| user_id | y | d | tenure_months | avg_sessions_week | spend_last_month | age_years | income_monthly | prior_purchases_12m | support_tickets_90d | premium_user | mobile_user | urban_resident | referred_user | m | m_obs | tau_link | g0 | g1 | cate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0.000000 | 0.0 | 28.814654 | 1.0 | 77.936767 | 50.234101 | 1926.698301 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.045453 | 0.045453 | 0.089095 | 8.137981 | 9.142395 | 1.004414 |
| 1 | 2 | 80.099611 | 1.0 | 25.913345 | 3.0 | 53.777740 | 28.115859 | 5104.271509 | 3.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.041514 | 0.041514 | 0.246679 | 60.459257 | 78.817307 | 18.358049 |
| 2 | 3 | 6.400482 | 1.0 | 24.969929 | 10.0 | 134.764322 | 22.907062 | 5267.938255 | 8.0 | 3.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.052593 | 0.052593 | 0.162968 | 7.712855 | 9.138577 | 1.425723 |
| 3 | 4 | 2.788238 | 0.0 | 40.655089 | 5.0 | 59.517074 | 31.970490 | 6597.327018 | 3.0 | 2.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.036221 | 0.036221 | 0.188755 | 25.386510 | 31.159932 | 5.773422 |
| 4 | 5 | 0.000000 | 0.0 | 18.560899 | 3.0 | 74.370930 | 39.237248 | 4930.009628 | 5.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.036343 | 0.036343 | 0.174757 | 15.359250 | 18.600227 | 3.240977 |
Ground truth ATTE is 10.914991 Treated share is 4.9490% (4949 / 100000)
CausalData(df=(100000, 13), treatment='d', outcome='y', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'age_years', 'income_monthly', 'prior_purchases_12m', 'support_tickets_90d', 'premium_user', 'mobile_user', 'urban_resident', 'referred_user'])
Comparison of Inference
Causalis
Causalis matched run (seed=123): ATTE=12.331200 | time=93.0s | peak memory=53.8 MB
| value | |
|---|---|
| field | |
| estimand | ATTE |
| model | IRM |
| value | 12.3312 (ci_abs: 7.8782, 16.7842) |
| value_relative | 26.7054 (ci_rel: 16.9239, 36.4870) |
| alpha | 0.0500 |
| p_value | 0.0000 |
| is_significant | True |
| n_treated | 4949 |
| n_control | 95051 |
| treatment_mean | 58.5062 |
| control_mean | 76.0871 |
| time | 2026-04-08 |
DoubleML
DoubleML matched run (seed=123): ATTE=12.331221 | time=49.9s | peak memory=40.3 MB
| coef | std err | t | P>|t| | 2.5 % | 97.5 % | |
|---|---|---|---|---|---|---|
| d | 12.331221 | 2.271978 | 5.427526 | 5.714047e-08 | 7.878225 | 16.784216 |
benchmark_seed 123 stability_seeds [1, 2, 3, 4, 5] n_folds 3 trimming_threshold 0.01 catboost_params {'iterations': 500, 'depth': 6, 'learning_rate... Name: matched_benchmark_config, dtype: object
| seed | oracle ATTE | value (ATTE) | abs error vs oracle | time (s) | peak memory (MB) | |
|---|---|---|---|---|---|---|
| library | ||||||
| Causalis | 123 | 10.915 | 12.3312 | 1.4162 | 93.0 | 53.8 |
| DoubleML | 123 | 10.915 | 12.3312 | 1.4162 | 49.9 | 40.3 |
| seed | causalis_atte | doubleml_atte | gap | causalis_abs_error | doubleml_abs_error | causalis_time_s | doubleml_time_s | causalis_peak_memory_mb | doubleml_peak_memory_mb | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 12.0184 | 12.018400 | 0.000000 | 1.103409 | 1.103408 | 41.884077 | 38.518782 | 52.933817 | 40.233231 |
| 1 | 2 | 10.6533 | 10.653295 | 0.000005 | 0.261691 | 0.261696 | 41.097836 | 34.594209 | 52.924817 | 40.230717 |
| 2 | 3 | 12.5838 | 12.583832 | -0.000032 | 1.668809 | 1.668841 | 36.624745 | 35.670716 | 52.924624 | 40.230673 |
| 3 | 4 | 12.5729 | 12.572861 | 0.000039 | 1.657909 | 1.657870 | 35.710764 | 34.821244 | 52.924680 | 40.230070 |
| 4 | 5 | 11.5216 | 11.521633 | -0.000033 | 0.606609 | 0.606641 | 37.362521 | 34.608914 | 52.924766 | 40.229713 |
Conclusion
| mean ATTE | std ATTE | mean abs error | std abs error | mean time (s) | std time (s) | mean peak memory (MB) | std peak memory (MB) | |
|---|---|---|---|---|---|---|---|---|
| library | ||||||||
| Causalis | 11.87 | 0.8105 | 1.0597 | 0.6271 | 38.5360 | 2.7742 | 52.9265 | 0.0041 |
| DoubleML | 11.87 | 0.8105 | 1.0597 | 0.6271 | 35.6428 | 1.6670 | 40.2309 | 0.0014 |
oracle_atte 10.914991 treated_share 0.04949 matched_seed_gap 0.000021 max_gap_over_stability_seeds 0.000039 near_identical_for_matched_seed True identical_over_stability_seeds False Name: agreement_check, dtype: object
With matched learners, fold count, trimming, and seeded sample splitting, Causalis and DoubleML agree up to numerical precision. The remaining movement is seed-to-seed benchmark variability, not a difference in the ATTE estimand.