Skip to content
Research3 min read

Compare Implementation of DML IRM in Causalis and DML IRM in DoubleML

This notebook presents the doubleml benchmark research workflow and key analysis steps.

Compare Implementation of DML IRM in Causalis and DML IRM in DoubleML

This notebook presents the doubleml benchmark research workflow and key analysis steps.

Comparing IRM model from Causalis with dml.DoubleMLIRM from DoubleML with default CatboostRegressor and CatboostClassifier for g0, g1 amd m

DGP

We will use DGP: generate_obs_hte_26_rich() read more at this notebook

Result
user_idydtenure_monthsavg_sessions_weekspend_last_monthage_yearsincome_monthlyprior_purchases_12msupport_tickets_90dpremium_usermobile_userurban_residentreferred_usermm_obstau_linkg0g1cate
010.0000000.028.8146541.077.93676750.2341011926.6983011.02.01.01.01.00.00.0454530.0454530.0890958.1379819.1423951.004414
1280.0996111.025.9133453.053.77774028.1158595104.2715093.00.01.01.00.01.00.0415140.0415140.24667960.45925778.81730718.358049
236.4004821.024.96992910.0134.76432222.9070625267.9382558.03.00.01.01.00.00.0525930.0525930.1629687.7128559.1385771.425723
342.7882380.040.6550895.059.51707431.9704906597.3270183.02.01.01.01.00.00.0362210.0362210.18875525.38651031.1599325.773422
450.0000000.018.5608993.074.37093039.2372484930.0096285.01.01.01.00.00.00.0363430.0363430.17475715.35925018.6002273.240977
Result

Ground truth ATTE is 10.914991 Treated share is 4.9490% (4949 / 100000)

Result

CausalData(df=(100000, 13), treatment='d', outcome='y', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'age_years', 'income_monthly', 'prior_purchases_12m', 'support_tickets_90d', 'premium_user', 'mobile_user', 'urban_resident', 'referred_user'])

Comparison of Inference

Causalis

Result

Causalis matched run (seed=123): ATTE=12.331200 | time=93.0s | peak memory=53.8 MB

value
field
estimandATTE
modelIRM
value12.3312 (ci_abs: 7.8782, 16.7842)
value_relative26.7054 (ci_rel: 16.9239, 36.4870)
alpha0.0500
p_value0.0000
is_significantTrue
n_treated4949
n_control95051
treatment_mean58.5062
control_mean76.0871
time2026-04-08

DoubleML

Result

DoubleML matched run (seed=123): ATTE=12.331221 | time=49.9s | peak memory=40.3 MB

coefstd errtP>|t|2.5 %97.5 %
d12.3312212.2719785.4275265.714047e-087.87822516.784216
Result

benchmark_seed 123 stability_seeds [1, 2, 3, 4, 5] n_folds 3 trimming_threshold 0.01 catboost_params {'iterations': 500, 'depth': 6, 'learning_rate... Name: matched_benchmark_config, dtype: object

Result
seedoracle ATTEvalue (ATTE)abs error vs oracletime (s)peak memory (MB)
library
Causalis12310.91512.33121.416293.053.8
DoubleML12310.91512.33121.416249.940.3
Result
seedcausalis_attedoubleml_attegapcausalis_abs_errordoubleml_abs_errorcausalis_time_sdoubleml_time_scausalis_peak_memory_mbdoubleml_peak_memory_mb
0112.018412.0184000.0000001.1034091.10340841.88407738.51878252.93381740.233231
1210.653310.6532950.0000050.2616910.26169641.09783634.59420952.92481740.230717
2312.583812.583832-0.0000321.6688091.66884136.62474535.67071652.92462440.230673
3412.572912.5728610.0000391.6579091.65787035.71076434.82124452.92468040.230070
4511.521611.521633-0.0000330.6066090.60664137.36252134.60891452.92476640.229713

Conclusion

Result
mean ATTEstd ATTEmean abs errorstd abs errormean time (s)std time (s)mean peak memory (MB)std peak memory (MB)
library
Causalis11.870.81051.05970.627138.53602.774252.92650.0041
DoubleML11.870.81051.05970.627135.64281.667040.23090.0014
Result

oracle_atte 10.914991 treated_share 0.04949 matched_seed_gap 0.000021 max_gap_over_stability_seeds 0.000039 near_identical_for_matched_seed True identical_over_stability_seeds False Name: agreement_check, dtype: object

With matched learners, fold count, trimming, and seeded sample splitting, Causalis and DoubleML agree up to numerical precision. The remaining movement is seed-to-seed benchmark variability, not a difference in the ATTE estimand.