DML IRM vs CausalML NearestNeighborMatch
This notebook presents the matching research workflow and key analysis steps.
Causal Inference consists of two main parts: Identification Assumptions and Model Specification. SUTVA, Unconfoundedness, Overlap are strong assumptions that must be true to call our inference causal. Studies and quasi experiments often have problems with Identification Assumptions so in practice you spend time to prove them, not model specification
However, In this notebook I will focus on the model specification. Propensity Score matching is a classical ML non-parametric approcah, estimating ATTE. It must perform worse than DML approach because:
-
uses both the propensity model and the outcome model, not just propensity scores
-
is more robust to small model misspecification through orthogonalization
-
uses cross-fitting, which reduces overfitting bias from ML nuisance models
-
does not throw away as much data as matching often does
-
provides more principled statistical inference (standard errors, confidence intervals)
-
can estimate ATE or ATTE cleanly, not just the treated-group effect by default
We will compare absolute estimates on DGPs from Causalis between IRM DML model implemented in Causalis and NearestNeighborMatch implemented in CausalML
generate_obs_hte_26_rich()
Read more about dgp at https://causalis.causalcraft.com/articles/generate_obs_hte_26_rich
Running n=10,000 ... Running n=100,000 ... Running n=1,000,000 ...
| n | ground_truth_atte | irm_atte | matching_atte | irm_abs_error | matching_abs_error | irm_runtime_sec | matching_runtime_sec | matched_n | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 10000 | 11.454404 | 6.256087 | 35.978189 | 5.198317 | 24.523785 | 3.325442 | 1.350460 | 618 |
| 1 | 100000 | 10.914991 | 12.106856 | 28.721069 | 1.191864 | 17.806077 | 16.588017 | 3.740718 | 9238 |
| 2 | 1000000 | 11.028129 | 10.340542 | 16.589285 | 0.687587 | 5.561156 | 220.501116 | 35.604447 | 99218 |
n=10,000: ground truth ATTE=11.454404, IRM ATTE=6.256087, matching ATTE=35.978189 n=100,000: ground truth ATTE=10.914991, IRM ATTE=12.106856, matching ATTE=28.721069 n=1,000,000: ground truth ATTE=11.028129, IRM ATTE=10.340542, matching ATTE=16.589285
DML IRM outperforms NearestNeighborMatch
generate_obs_hte_binary_26()
read more about the dgp at https://causalis.causalcraft.com/articles/generate_obs_hte_binary_26
Running n=10,000 ... Running n=100,000 ... Running n=1,000,000 ...
| n | ground_truth_atte | irm_atte | matching_atte | irm_abs_error | matching_abs_error | irm_runtime_sec | matching_runtime_sec | matched_n | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 10000 | 0.103885 | 0.102344 | 0.115652 | 0.001541 | 0.011767 | 4.476367 | 1.970728 | 2594 |
| 1 | 100000 | 0.101238 | 0.103547 | 0.074051 | 0.002309 | 0.027187 | 27.549454 | 4.830420 | 29304 |
| 2 | 1000000 | 0.101411 | 0.103282 | 0.094218 | 0.001871 | 0.007192 | 221.720278 | 35.615880 | 298710 |
n=10,000: ground truth ATTE=0.103885, IRM ATTE=0.102344, matching ATTE=0.115652 n=100,000: ground truth ATTE=0.101238, IRM ATTE=0.103547, matching ATTE=0.074051 n=1,000,000: ground truth ATTE=0.101411, IRM ATTE=0.103282, matching ATTE=0.094218
DML IRM outperforms NearestNeighborMatch
Conclusion
I recommend to use DML IRM for Unconfoundedness scenario as default model specification