DML ATE Example

DML ATE Example

This notebook covers scenario: | Is RCT | Treatment | Outcome | EDA | Estimands | Refutation | |---------------|-----------|------------|-----|-----------|------------| | Observational | Binary | Continuous | Yes | ATE | Yes |

We will estimate Average Treatment Effect (ATE) of binary treatment on continuous outcome. It shows explonatary data analysis and refutation tests

Generate data

Let's generate data of how feature (Treatment) impact on ARPU (Outcome) with linear effect (theta) = 1.8

Result
ydtenure_monthsavg_sessions_weekspend_last_monthpremium_userurban_resident
04.1277140.027.6566055.35255472.5525681.00.0
111.1220081.011.5201916.798247188.4812871.00.0
210.5803931.033.0054142.05545951.0404400.01.0
36.9828441.035.2867774.429404166.9922390.01.0
410.8993810.00.5875786.658307179.3711260.00.0

EDA

Result

{'n_rows': 10000, 'n_columns': 7}

General dataset information

Let's see how outcome differ between clients who recieved the feature and didn't

Result
countmeanstdminp10p25medianp75p90max
treatment
0.080306.1374333.933863-9.8664471.1182913.5174276.1575838.84790711.11477620.770359
1.019708.6089733.942856-3.8214923.6668925.9866138.56091211.24742913.55261221.377687
Result

png

Result

png

Propensity

Now let's examine how propensity score differ treatments

Result
mean_t_0mean_t_1abs_diffsmdksks_pvalue
confounders
premium_user0.2180570.3822340.1641760.3640370.1641761.061599e-37
tenure_months23.40535525.7994992.3941430.1990240.0874855.511096e-11
avg_sessions_week4.9763545.3029990.3266450.1635090.0757822.382131e-08
urban_resident0.5875470.6553300.0677830.1400720.0677839.127269e-07
spend_last_month99.288113104.9412505.6531370.0971720.0573485.761025e-05
Result

ROC AUC from PropensityModel: 0.5926

Result

Positivity check from PropensityModel: {'bounds': (0.05, 0.95), 'share_below': 0.0121, 'share_above': 0.0, 'flag': False}

Result
featureshap_meanshap_mean_absexact_pp_change_absexact_pp_change_signed
0num__spend_last_month0.0002990.1660020.0274380.000047
1num__premium_user-0.0002690.3063010.052687-0.000042
2num__urban_resident0.0002450.1589000.0262100.000039
3num__avg_sessions_week-0.0001410.1740820.028841-0.000022
4num__tenure_months-0.0001350.1948780.032482-0.000021
Result

png

Outcome regression

Let's analyze how confounders predict outcome

Result

{'rmse': 3.656989617205263, 'mae': 2.90424413216463}

Result
featureshap_mean
0avg_sessions_week-0.000502
1spend_last_month0.000350
2urban_resident0.000245
3premium_user-0.000055
4tenure_months-0.000038

Inference

Now time to estimate ATE with Double Machine Learning

Result

1.766253520771568 0.0 (1.5329683790491713, 1.9995386624939646)

True theta in our data generating proccess was 1.8

Refutation

Overlap

Result
metricvalueflag
0edge_0.01_below0.000000GREEN
1edge_0.01_above0.000000GREEN
2edge_0.02_below0.001000GREEN
3edge_0.02_above0.000000GREEN
4KS0.139672GREEN
5AUC0.592382GREEN
6ESS_treated_ratio0.742224GREEN
7ESS_control_ratio0.970455GREEN
8tails_w1_q99/med6.278831GREEN
9tails_w0_q99/med2.844508GREEN
10ATT_identity_relerr0.037683GREEN
11clip_m_total0.000100GREEN
12calib_ECE0.036215GREEN
13calib_slope0.520588RED
14calib_intercept-0.637457RED

Score

Result
metricvalueflag
0se_plugin1.190252e-01NA
1psi_p99_over_med1.355647e+01YELLOW
2psi_kurtosis6.047594e+01RED
3max_|t|_g15.463687e+00RED
4max_|t|_g01.384511e+00GREEN
5max_|t|_m1.388284e+00GREEN
6oos_tstat_fold1.241609e-15GREEN
7oos_tstat_strict1.241615e-15GREEN

SUTVA

Result

1.) Are your clients independent (i)? 2.) Do you measure confounders, treatment, and outcome in the same intervals? 3.) Do you measure confounders before treatment and outcome after? 4.) Do you have a consistent label of treatment, such as if a person does not receive a treatment, he has a label 0?

Uncofoundedness

Result
metricvalueflag
0balance_max_smd0.021956GREEN
1balance_frac_violations0.000000GREEN
Result

{'theta': 1.766253520771568, 'se': 0.11902521860734187, 'level': 0.95, 'z': 1.959963984540054, 'sampling_ci': (1.5329683790491713, 1.9995386624939646), 'theta_bounds_confounding': (1.6875916811818483, 1.8449153603612876), 'bias_aware_ci': (1.4543065394594517, 2.0782005020836842), 'max_bias': 0.07866183958971962, 'sigma2': 13.136641553423134, 'nu2': 0.43676740621591587, 'params': {'cf_y': 0.01, 'cf_d': 0.01, 'rho': 1.0, 'use_signed_rr': False}}

Result
cf_ycf_drhotheta_longtheta_shortdelta
d0.0000343.037675e-07-1.01.7662541.845679-0.079425