Scenario: Classic RCT
We call 'Classic Randomized Controlled Trial' (RCT) a scenario where a treatment is randomly assigned to participants, and we do not have pre-experiment data of participants like pre-treatment outcome.
Treatment - new onboarding for new users.
We will test hypothesis:
- There is no difference in conversion rate between treatment and control groups.
- There is a difference in conversion rate between treatment and control groups.
Data
We will use DGP from causalis. More you can read at https://causalis.causalcraft.com/articles/generate_classic_rct_26
| user_id | conversion | d | platform_ios | country_usa | source_paid | m | m_obs | tau_link | g0 | g1 | cate | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 01fc4 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.5 | 0.5 | 0.106483 | 0.310620 | 0.333868 | 0.023249 |
| 1 | 0204c | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 0.5 | 0.5 | 0.106483 | 0.198257 | 0.215727 | 0.017471 |
| 2 | 002cf | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.5 | 0.5 | 0.106483 | 0.231969 | 0.251479 | 0.019509 |
| 3 | 0202d | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.5 | 0.5 | 0.106483 | 0.231969 | 0.251479 | 0.019509 |
| 4 | 011cb | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.5 | 0.5 | 0.106483 | 0.142189 | 0.155678 | 0.013489 |
Ground truth ATE is 0.01719144406311028
CausalData(df=(10000, 5), treatment='d', outcome='conversion', confounders=['platform_ios', 'country_usa', 'source_paid'])
| treatment | count | mean | std | min | p10 | p25 | median | p75 | p90 | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | 4955 | 0.198991 | 0.399281 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
| 1 | 1.0 | 5045 | 0.232904 | 0.422723 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |

Monitoring
Our system is randomly splitting users. Half of them must have new onboarding, other half has not. We should monitor the split with SRM test. Read more at https://causalis.causalcraft.com/articles/srm
Check the confounders balance
Are groups equal in terms of confounders? We need to choose with domain and business sense confounders and check balance of them. The standard benchmark:
ks_pvalue< 0.05
As we see system splitted users randomly
Estimation with Diff-in-Means
Inference methods
In Causalis.DiffInMeans model implemented ttest, conversion_ztest and bootstrap:
- use
conversion_ztestwhen users < 100k and outcome is binary - use
bootstrapwhen users < 10k or outcome is ratio metric or your metric is highly skewed - in other cases use
ttest
We will use conversion_ztest for our scenario
conversion_ztest
The conversion_ztest performs a statistical comparison of conversion rates between two groups (Treatment and Control). It provides a p-value for the hypothesis test, and robust confidence intervals for both absolute and relative differences.
1. Observed Metrics
For each group (Control , Treatment ):
- : Total number of observations.
- : Number of successes (conversions).
- : Observed conversion rates.
2. Hypothesis Test (P-value)
The test evaluates (no difference).
- Pooled Proportion:
- Pooled Standard Error:
- Z-Statistic:
- P-value: , where is the standard normal CDF.
3. Absolute Difference (Newcombe CI)
To calculate the confidence interval for the difference , we use the Newcombe method, which is more robust than standard Wald intervals for conversion rates.
- Wilson Score Interval for each group:
- Combined Interval: (where is the critical value for the chosen )
4. Relative Difference (Lift)
Lift measures the percentage change: . The confidence interval uses a delta-method approximation on the lift scale:
- Relative CI:
(If is extremely close to 0, the lift is undefined; the implementation returns inf/0 and NaN for the CI.)