Case Study5 min read

Probability to become a premium

Automated conversion of probability_of_premium.ipynb

Probability to become a premium

We have lunched new section that tells about premium subscription advantage Treatment - Saw new section Outcome - Conversion to premium subscription Our covariate is probability of premium subscription predicted by our ML model.

We will test hypothesis:

HoH_o - There is no difference in conversion between treatment and control groups.

HaH_a - There is a difference in conversion between treatment and control groups.

Data

We will use DGP from Causalis. Read more at https://causalis.causalcraft.com/articles/generate_cuped_binary

Result
ydtenure_monthsspend_last_monthdiscount_ratesupport_ticketsemail_open_ratereferral_countplan_tier_plusplan_tier_proregion_eumm_obstau_linkg0g1catey_pre
00.00.014.18746162.8670150.0873952.00.4122142.00.00.00.00.50.50.0815080.2676990.2806820.0129830.009942
10.00.07.242801112.0896260.1739392.00.1319570.00.00.00.00.50.50.0373010.2231800.2284830.005303-0.220081
21.01.017.72942316.8135230.1419002.00.3763930.01.00.00.00.50.50.0604420.2457170.2548610.0091440.096021
30.00.019.49742443.4565450.1466942.00.6806021.00.00.00.00.50.50.0961950.2690240.2844220.0153980.188115
40.00.04.592766105.9876560.1123266.00.6421130.01.00.01.00.50.50.0111410.2640300.2657710.001741-0.250511
Result

CausalData(df=(10000, 12), treatment='d', outcome='y', confounders=['tenure_months', 'spend_last_month', 'discount_rate', 'support_tickets', 'email_open_rate', 'referral_count', 'plan_tier_plus', 'plan_tier_pro', 'region_eu', 'y_pre'])

Result
treatmentcountmeanstdminp10p25medianp75p90max
00.050340.2769170.4475200.00.00.00.01.01.01.0
11.049660.2980270.4574370.00.00.00.01.01.01.0
Result

png

So we see that new section has higher conversion rate. Let's check if it is statistically significant and our test holds the indentification assumptions

Monitoring of the split

Result

SRMResult(status=no SRM, p_value=0.49650, chi2=0.4624)

Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0region_eu0.2103690.1890860.021284-0.0532500.20331
1spend_last_month77.92116775.1115952.809573-0.0393810.68313
2plan_tier_pro0.1515690.1610950.0095260.0262300.97481
3support_tickets1.8178391.7895690.028270-0.0209300.72761
4referral_count0.7973780.7851390.012239-0.0136961.00000
5tenure_months13.78050013.7322160.048284-0.0065230.69284
6plan_tier_plus0.2989670.2962140.002753-0.0060201.00000
7discount_rate0.1003750.1005140.0001390.0021230.34966
8y_pre-0.0000640.0000650.0001290.0006120.64056
9email_open_rate0.4477860.4478050.0000200.0001550.95826

There is no evidence of breaking unconfoundedness assumption

Inference

We will use the CUPEDModel that implements the Lin (2013) "interacted adjustment" for ATE (Average Treatment Effect) estimation in randomized controlled trials (RCTs). This method is a robust version of ANCOVA that remains valid even when the treatment effect is heterogeneous with respect to the covariates.

1. Specification

The model fits an Ordinary Least Squares (OLS) regression of the outcome YY on the treatment indicator DD and centered pre-treatment covariates XcX^c. The specification includes full interactions between the treatment and the centered covariates:

Yi=α+τDi+βTXic+γT(DiXic)+ϵiY_i = \alpha + \tau D_i + \beta^T X_i^c + \gamma^T (D_i \cdot X_i^c) + \epsilon_i

Where:

  • YiY_i: Outcome for individual ii.
  • DiD_i: Binary treatment indicator (Di{0,1}D_i \in \{0, 1\}).
  • XiX_i: Vector of pre-treatment covariates.
  • Xic=XiXˉX_i^c = X_i - \bar{X}: Centered covariates (where Xˉ\bar{X} is the sample mean).
  • α\alpha: Intercept (represents the mean outcome of the control group when X=XˉX = \bar{X}).
  • τ\tau: Average Treatment Effect (ATE) or Intent-to-Treat (ITT) effect.
  • β\beta: Vector of coefficients for the main effects of the covariates.
  • γ\gamma: Vector of coefficients for the interaction terms between treatment and covariates.
  • ϵi\epsilon_i: Residual error term.
Result
value
field
estimandATE
modelCUPEDModel
value0.0210 (ci_abs: 0.0054, 0.0365)
value_relative7.5747 (ci_rel: 1.9572, 13.1923)
alpha0.0500
p_value0.0081
is_significantTrue
n_treated4966
n_control5034
treatment_mean0.2980
control_mean0.2769
time2026-02-18

Our result is significant with relative ci 7.5747% (ci_rel: 1.9572%, 13.1923%)

Result

png

Result

var reduction by CUPED %: 23.382355917736007

CUPED is worked. We reduced variance

Let's check other assuptions

SUTVA

Result

1.) Are your clients independent (i). Outcome of ones do not depend on others? 2.) Are all clients have full window to measure metrics? 3.) Do you measure confounders before treatment and outcome after? 4.) Do you have a consistent label of treatment, such as if a person does not receive a treatment, he has a label 0?

  1. We assume that there is no networking effect
  2. Metrics are valid
  3. Confounders and covariates meseared before the treatment and outcome after
  4. Lebeling treatment are consistent based of our logging system

Overlap

Overlap is true by design

Regression specification

Result
 test_idtestflagvaluethresholdmessage
0design_rankDesign rankGREENrank=4, k=4rank == kDesign matrix is full rank.
1condition_numberCondition numberGREEN12.310161<= 1.000e+08Condition number is within expected range.
2near_duplicatesNear-duplicate covariatesGREEN00 pairsNo near-duplicate centered covariates found.
3vifVariance inflation factorGREENnan<= 20VIF not applicable (fewer than two usable covariates).
4ate_gapAdjusted vs naive ATEGREEN0.014799yellow: > 2.00, red: > 2.50Adjusted and naive ATE are reasonably aligned.
5residual_tailsResidual extremesGREENmax|std resid|=2.53yellow > 7, red > 10Residual extremes look reasonable.
6leverageLeverageGREENmax_h=0.002482, n_high=787yellow if max_h > 50.0008, red if max_h > max(0.5, 100.0008)No high-leverage concentration detected.
7cooksCook's distanceGREENmax=0.003232, n_high=434yellow if max Cook's > 0.1, red if > 1No strong influence signal from Cook's distance.
8hc23_stabilityHC2/HC3 stabilityGREENmin(1-h)=9.975e-01, n_tiny=0min(1-h) >= 1.0e-06HC2/HC3 stability check passed.
9winsor_sensitivityWinsor sensitivityGREEN0.000000yellow: > 1.00 SE, red: > 2.00 SEWinsorized refit is close to baseline ATE.

Our regression specification is valid

In conclution

The new section is performing better than the older. Effect is "0.0210 (ci_abs: 0.0054, 0.0365)" in p.p. Roll out to all users