Research1 min read

Benchmarking: DiffInMeans vs Scipy implementation

Automated conversion of benchmarking_diff_in_means.ipynb

Benchmarking: DiffInMeans vs Scipy implementation

Goal: verify that DiffInMeans implementations of ttest, conversion_ztest, and bootstrap match raw reference calculations from SciPy/Statsmodels.

Result

n_control=4955, n_treated=5045 control mean=0.198991, treated mean=0.232904

1) Welch t-test: model vs SciPy/Statsmodels

Result
metricDiffInMeansExternalabs_diff
0p_value0.000037400.000037400.0
1absolute_ci_low0.017796920.017796920.0
2absolute_ci_high0.050028970.050028970.0

ttest validation passed

2) Conversion z-test: model vs Statsmodels two-proportion tools

Result
metricDiffInMeansExternalabs_diff
0p_value0.000037940.000037946.78575035e-17
1absolute_ci_low0.011107630.011107630.00000000e+00
2absolute_ci_high0.056658340.056658340.00000000e+00

conversion_ztest validation passed

3) Bootstrap diff-in-means: model vs SciPy bootstrap

Result
metricDiffInMeansExternalabs_diff
0p_value0.000036710.000047210.00001050
1absolute_ci_low0.017720640.017796060.00007543
2absolute_ci_high0.049896330.050451650.00055531

bootstrap validation passed

4) Confidence intervals side-by-side

Result
methodsourcep_valueabs_lowabs_high
0bootstrapDiffInMeans0.000036710.017720640.04989633
1bootstrapExternal0.000047210.017796060.05045165
2conversion_ztestDiffInMeans0.000037940.011107630.05665834
3conversion_ztestExternal0.000037940.017786970.05001138
4ttestDiffInMeans0.000037400.017796920.05002897
5ttestExternal0.000037400.017796920.05002897

In conclusion: Diff_in_Means model is implemented correctly.