Skip to content
Scenario3 min read

Estimand: Average Treatment Effect (ATE)

The model assumes random assignment of treatment $D \in \{0, 1\}$. The ATE ($\tau$) is defined as:

Estimand: Average Treatment Effect (ATE)

The model assumes random assignment of treatment D{0,1}D \in \{0, 1\}. The ATE (τ\tau) is defined as:
τ=E[YD=1]E[YD=0]\tau = E[Y | D=1] - E[Y | D=0]
The sample estimator is the difference in group means:
τ^=Yˉ1Yˉ0\hat{\tau} = \bar{Y}_1 - \bar{Y}_0

Inference Methods

Welch's T-Test (ttest)
Used for continuous outcomes without assuming equal variances.

  • Standard Error: SE=s12n1+s02n0SE = \sqrt{\frac{s_1^2}{n_1} + \frac{s_0^2}{n_0}}, where s2s^2 is the sample variance.
  • Degrees of Freedom (Satterthwaite):
    ν(s12n1+s02n0)2(s12/n1)2n11+(s02/n0)2n01\nu \approx \frac{\left(\frac{s_1^2}{n_1} + \frac{s_0^2}{n_0}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_0^2/n_0)^2}{n_0-1}}
  • Confidence Interval: τ^±t1α/2,νSE\hat{\tau} \pm t_{1-\alpha/2, \nu} \cdot SE
  • P-value: Two-sided test based on the tt-distribution with ν\nu degrees of freedom.

Conversion Z-Test (conversion_ztest)
Optimized for binary conversion outcomes.

  • Proportions: p1=X1n1,p0=X0n0p_1 = \frac{X_1}{n_1}, p_0 = \frac{X_0}{n_0}
  • Standard Error (Pooled): SEpooled=p^(1p^)(1n1+1n0)SE_{pooled} = \sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_0}\right)} where p^=X1+X0n1+n0\hat{p} = \frac{X_1 + X_0}{n_1 + n_0}
  • Absolute CI (Newcombe-style): Uses the difference of Wilson score intervals:
    CI=[L1U0,U1L0]CI = [L_1 - U_0, U_1 - L_0]
    where [Li,Ui][L_i, U_i] are the Wilson score bounds for proportion pip_i.

Welch Permutation T-Test (welch_permutation_t_test)
Uses the Welch statistic, but estimates the p-value by repeatedly permuting treatment labels while preserving group sizes.

  • Statistic: t=Yˉ1Yˉ0s12/n1+s02/n0t = \frac{\bar{Y}_1 - \bar{Y}_0}{\sqrt{s_1^2/n_1 + s_0^2/n_0}}.
  • Absolute CI: Welch-Satterthwaite interval for τ^\hat{\tau}.
  • P-value: Monte Carlo permutation p-value with a +1 correction:
    p=1+b=1BI(tbtobs)B+1p = \frac{1 + \sum_{b=1}^{B} I(|t_b^*| \ge |t_{obs}|)}{B + 1}

Relative Lift and Delta Method

The relative lift is calculated as:
Lift(%)=100(Yˉ1Yˉ01)Lift (\%) = 100 \cdot \left(\frac{\bar{Y}_1}{\bar{Y}_0} - 1\right)
Its variance is estimated via the Delta Method:
Var(Lift/100)1Yˉ02Var(Yˉ1)+Yˉ12Yˉ04Var(Yˉ0)Var(Lift/100) \approx \frac{1}{\bar{Y}_0^2} Var(\bar{Y}_1) + \frac{\bar{Y}_1^2}{\bar{Y}_0^4} Var(\bar{Y}_0)


Pseudo-code

Model Wrapper

Welch's T-Test Inference

Conversion Z-Test

Welch Permutation T-Test Inference

References

Estimand / difference-in-means ATE under random assignment

  • Neyman (1923; English translation 1990) — potential outcomes framework for randomized experiments; difference-in-means and its sampling properties under randomization. (ics.uci.edu)

  • Rubin (1974) — formalizes causal effects via potential outcomes for randomized (and nonrandomized) studies; motivates ATE as an estimand. (Ovid)

  • Freedman (2008) — uses the Neyman/Rubin potential-outcomes model to discuss inference and common adjustments in experiments (helpful for “design-based” framing around diff-in-means). (causal.unc.edu)

  • Imbens & Rubin (2015) — book, but extremely standard citation for ATE estimands + sampling variances for average causal effects. (Cambridge University Press & Assessment)

Welch’s t-test + Satterthwaite df

  • Student (1908) — original t distribution and small-sample inference motivation. (JSTOR)

  • Welch (1947) — the unequal-variance two-sample t-test (the modern “Welch’s t-test”). DOI: 10.1093/biomet/34.1-2.28. (OUP Academic)

  • Satterthwaite (1946) — effective degrees of freedom approximation (the “Welch–Satterthwaite” df you wrote). (JSTOR)

Conversion z-test + “Newcombe/Wilson-style” absolute CI for two proportions

Wilson score interval (single proportion):

  • Wilson (1927) — the score interval for a binomial proportion (better coverage than Wald). (JSTOR)

Newcombe’s CI for difference of independent proportions (the one you’re implementing as “Newcombe-style / Wilson score bounds then subtract”):

  • Newcombe (1998) — “Interval estimation for the difference between independent proportions: comparison of eleven methods.” This is the go-to citation for the Wilson-score-based risk-difference CI family. DOI: 10.1002/(SICI)1097-0258(19980430)17:8<873::AID-SIM779>3.0.CO;2-I. (PubMed)

(If you also want a citation for Wilson/score intervals generally outperforming “exact”/Wald-style intervals in practice:)

  • Agresti & Coull (1998) — “Approximate is Better than ‘Exact’…” (single-proportion intervals, but commonly cited in the same discussion). (math.unm.edu)

  • Agresti & Caffo (2000) — simple adjusted intervals for proportions and differences (alternative to Newcombe; useful “related work” citation). (Statistics)

Permutation tests

  • Fisher (1935) — randomization/permutation testing as exact design-based inference under random assignment.

  • Good (2005) — practical reference for permutation, parametric, and bootstrap tests, including Monte Carlo permutation p-values.

Relative lift (ratio) + Delta method variance

  • Oehlert (1992) — clean, standard citation for delta method approximations. DOI: 10.1080/00031305.1992.10475842. (Taylor & Francis Online)

And since your pseudo-code explicitly guards when the control mean is near zero (ratio instability), it’s also common to cite ratio-CI alternatives:

  • Fieller (1954) — Fieller-type confidence intervals for ratios (classic reference when denominators can be near 0). (JSTOR)