Skip to content
Scenario4 min read

CUPED

We call 'Controlled-experiment Using Pre-Experiment Data' (CUPED) a scenario where a treatment is randomly assigned to participants, and we have pre-experiment ...

CUPED

We call 'Controlled-experiment Using Pre-Experiment Data' (CUPED) a scenario where a treatment is randomly assigned to participants, and we have pre-experiment data of participants like pre-treatment outcome.

Treatment - new product category for users.

We will test hypothesis:

HoH_o - There is no difference in LTV between treatment and control groups.

HaH_a - There is a difference in LTV between treatment and control groups.

Causal Assumptions

Unconfoundedness: random assignment of treatment. Will be tested with SRM and Balance Check

Overlap: each unit has a non-zero probability of assignment to every arm. By design

SUTVA: no interference and consistent treatment definitions. By design

Pre-treatment covariate: exists and predicts outcome

Data

For the analysis you need data in pandas dataframe:

  • treatment column in binary format (1/0)
  • outcome column numeric format, measured after treatment time
  • user_id column (Optional, but useful)
  • confounders columns (Optional, measured before treatment time, numeric format, used for causal assumption check, includes covariates for CUPED)

We will take data from Causalis DGP. Read more at https://causalis.causalcraft.com/articles/make_cuped_tweedie_26

Result
ydtenure_monthsavg_sessions_weekspend_last_monthdiscount_rateplatform_iosplatform_webmm_obstau_linkg0g1catey_pre_latent_Ay_pre_2
03.7347630.014.1874612.057.3553000.1581641.00.00.50.50.1038258.4879669.4166050.92863917.9721380.30471710.783283
10.7464061.06.3528933.046.7009460.0857220.00.00.50.50.0307818.4879668.7533010.2653350.000000-1.0399840.000000
213.0405841.018.9101539.080.1361870.1751151.00.00.50.50.3573558.48796612.1339153.64594934.7718370.75045124.866330
334.5821131.07.9276274.033.7182240.1527181.00.00.50.50.0655548.4879669.0630250.575059349.1639430.940565209.498366
40.0000001.011.1069252.092.0645180.0773900.00.00.50.50.0560368.4879668.9771720.4892060.000000-1.9510350.243980
Result

CausalData(df=(20000, 9), treatment='d', outcome='y', confounders=['avg_sessions_week', 'spend_last_month', 'discount_rate', 'platform_ios', 'platform_web', 'y_pre', 'y_pre_2'])

EDA

Result
treatmentcountmeanstdminp10p25medianp75p90max
00100498.86713621.0975990.00.00.00.2769019.26613424.454852347.095992
1199519.87018825.8790150.00.00.00.00000010.40991627.439125956.413897
Result

png

Result

png

Result
treatmentnoutlier_countoutlier_ratelower_boundupper_boundhas_outliersmethodtail
001004910800.107473-13.89920123.165335Trueiqrboth
11995110630.106823-15.61487526.024791Trueiqrboth

We see heavy tale distribution

SRM

Some system is randomly splitting users. Half must have new onboarding, other half has not. We should monitor the split with SRM test. Read more at https://causalis.causalcraft.com/articles/srm

Result

SRMResult(status=no SRM, p_value=0.48833, chi2=0.4802)

Confounders balance

Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0spend_last_month73.78766775.1590171.3713500.0197260.20812
1platform_ios0.3043090.2973570.006952-0.0151580.96771
2y_pre7928.51644820967.67242713039.1559790.0146210.36084
3y_pre_24760.19212912583.7092027823.5170730.0146210.38473
4avg_sessions_week4.9633795.0156770.0522970.0124050.49401
5discount_rate0.1004200.0999960.000424-0.0064210.36146
6platform_web0.0509500.0507490.000202-0.0009181.00000
  • SRM is good
  • SMD < 0.1 and ks_pvalue > 0.05

Split is random. Uncofoundedness is true

Inference with CUPED

Read more about model specification at https://causalis.causalcraft.com/articles/cuped-model

Result
value
field
estimandATE
modelCUPEDModel
value0.8155 (ci_abs: 0.2724, 1.3587)
value_relative9.0633 (ci_rel: 2.7548, 15.3718)
alpha0.0500
p_value0.0033
is_significantTrue
n_treated9951
n_control10049
treatment_mean9.8702
control_mean8.8671
time2026-05-09

Refutation

Result

png

Result

var reduction with CUPED %: 31.179489563275986

Result
 test_idtestflagvaluethresholdmessage
0design_rankDesign rankGREENrank=6, k=6rank == kDesign matrix is full rank.
1condition_numberCondition numberGREEN3335417.561613<= 1.000e+08Condition number is within expected range.
2near_duplicatesNear-duplicate covariatesYELLOW1red if >= 3Near-duplicate covariate pairs detected.
3vifVariance inflation factorRED6485299680.233781yellow: > 20, red: > 40Very large VIF indicates severe multicollinearity.
4ate_gapAdjusted vs naive ATEGREEN0.561334yellow: > 2.00, red: > 2.50Adjusted and naive ATE are reasonably aligned.
5residual_tailsResidual extremesREDmax|std resid|=27.8yellow > 7, red > 10Extremely large standardized residuals; outliers likely dominate.
6leverageLeverageREDmax_h=0.91, n_high=653yellow if max_h > 50.0006, red if max_h > max(0.5, 100.0006)Extreme leverage points detected.
7cooksCook's distanceREDmax=1301, n_high=778yellow if max Cook's > 0.1, red if > 1Strongly influential observations detected.
8hc23_stabilityHC2/HC3 stabilityGREENmin(1-h)=9.004e-02, n_tiny=0min(1-h) >= 1.0e-06HC2/HC3 stability check passed.
9winsor_sensitivityWinsor sensitivityGREEN0.304010yellow: > 1.00 SE, red: > 2.00 SEWinsorized refit is close to baseline ATE.

let's compare it to oracle effect

Result

Ground truth ATE is 1.2383515933360814