classic_rct
Modules
- conversion_ztest – Two-proportion z-test
- dgp –
- inference – Inference helpers for the classic RCT scenario.
- model –
- rct_design – Design module for experimental rct_design utilities.
- ttest – T-test inference for Diff_in_Means model
Classes
- DiffInMeans – Difference-in-means model for CausalData.
- SRMResult – Result of a Sample Ratio Mismatch (SRM) check.
Functions
- bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
- check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
DiffInMeans
Difference-in-means model for CausalData. Wraps common RCT inference methods: t-test, bootstrap, and conversion z-test.
Functions
- estimate – Compute the treatment effect using the specified method.
- fit – Fit the model by storing the CausalData object.
data
estimate
Compute the treatment effect using the specified method.
Parameters
- method (
('ttest', 'bootstrap', 'conversion_ztest')) – The inference method to use. - "ttest": Standard independent two-sample t-test.
- "bootstrap": Bootstrap-based inference for difference in means.
- "conversion_ztest": Two-proportion z-test for binary outcomes.
- alpha (
float) – The significance level for calculating confidence intervals. - diagnostic_data (
bool) – Whether to include diagnostic data_contracts in the result. - **kwargs (
Any) – Additional arguments passed to the underlying inference function. - For "bootstrap": can pass
n_simul,batch_size,seed,index_dtype.
Returns
CausalEstimate– A results object containing effect estimates and inference.
fit
Fit the model by storing the CausalData object.
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables.
Returns
DiffInMeans– The fitted model.
SRMResult
Result of a Sample Ratio Mismatch (SRM) check.
Attributes
- chi2 (
float) – The calculated chi-square statistic. - p_value (
float) – The p-value of the test, rounded to 5 decimals. - expected (
dict[Hashable, float]) – Expected counts for each variant. - observed (
dict[Hashable, int]) – Observed counts for each variant. - alpha (
float) – Significance level used for the check. - is_srm (
bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise. - warning (
str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).
alpha
chi2
expected
is_srm
observed
p_value
warning
bootstrap_diff_means
Bootstrap inference for difference in means between treated and control groups.
This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables. - alpha (
float) – The significance level for calculating confidence intervals (between 0 and 1). - n_simul (
int) – Number of bootstrap resamples. - batch_size (
int) – Number of bootstrap samples to process per batch. - seed (
int) – Random seed for reproducibility. - index_dtype (
numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.
Returns
Dict[str, Any]– A dictionary containing:- p_value: Two-sided p-value using normal approximation.
- absolute_difference: The absolute difference (treated - control).
- absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
- relative_difference: The relative difference (%) relative to control mean.
- relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).
Raises
ValueError– If inputs are invalid, treatment is not binary, or groups are empty.
check_srm
Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
Parameters
- assignments (
Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as{variant: observed_count}with non-negative integer counts. - target_allocation (
dict[Hashable, Number]) – Mapping{variant: p}describing intended allocation as probabilities. - alpha (
float) – Significance level. Use strict values like 1e-3 or 1e-4 in production. - min_expected (
float) – If any expected count < min_expected, a warning is attached. - strict_variants (
bool) – - True: fail if observed variants differ from target keys. - False: drop unknown variants and test only on declared ones.
Returns
SRMResult– The result of the SRM check.
Raises
ValueError– If inputs are invalid or empty.ImportError– If scipy is required but not installed.
Notes
- Target allocation probabilities must sum to 1 within numerical tolerance.
is_srmis computed using the unrounded p-value; the returnedp_valueis rounded to 5 decimals.- Missing assignments are dropped and reported via
warning. - Requires SciPy for p-value computation.
Examples:
conversion_ztest
Two-proportion z-test
Compares conversion rates between treated (D=1) and control (D=0) groups. Returns p-value, absolute/relative differences, and their confidence intervals
Functions
- conversion_ztest – Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).
conversion_ztest
Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables. - alpha (
float) – The significance level for calculating confidence intervals (between 0 and 1). - ci_method (
(newcombe, wald_unpooled, wald_pooled)) – Method for calculating the confidence interval for the absolute difference. "newcombe" is the most robust default for conversion rates. - se_for_test (
(pooled, unpooled)) – Method for calculating the standard error for the z-test p-value. "pooled" (score test) is generally preferred for testing equality of proportions.
Returns
Dict[str, Any]– A dictionary containing:- p_value: Two-sided p-value from the z-test
- absolute_difference: Difference in conversion rates (treated - control)
- absolute_ci: Tuple (lower, upper) for the absolute difference CI
- relative_difference: Percentage change relative to control rate
- relative_ci: Tuple (lower, upper) for the relative difference CI (delta method)
Raises
ValueError– If treatment/outcome are missing, treatment is not binary, outcome is not binary, groups are empty, or alpha is outside (0, 1).
dgp
Functions
- classic_rct_gamma_26 – A pre-configured classic RCT dataset with a gamma outcome.
- generate_classic_rct_26 – A pre-configured classic RCT dataset with 3 binary confounders.
classic_rct_gamma_26
A pre-configured classic RCT dataset with a gamma outcome.
n=10000, split=0.5, mean uplift ~10%.
Includes deterministic user_id and ancillary columns.
Parameters
- seed (
int) – Random seed. - add_pre (
bool) – Whether to generate a pre-period covariate ('y_pre'). - beta_y (
array - like) – Linear coefficients for confounders in the outcome model. - outcome_depends_on_x (
bool) – Whether to add default effects for confounders if beta_y is None. - include_oracle (
bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc. - return_causal_data (
bool) – Whether to return a CausalData object. - n (
int) – Number of samples. - split (
float) – Proportion of samples assigned to the treatment group. - outcome_params (
dict) – Gamma outcome parameters, e.g. {"shape": 2.0, "scale": {"A": 15.0, "B": 16.5}}. - add_ancillary (
bool) – Whether to add standard ancillary columns (age, platform, etc.). - deterministic_ids (
bool) – Whether to generate deterministic user IDs. - **kwargs – Additional arguments passed to
classic_rct_gamma.
Returns
CausalData or DataFrame–
generate_classic_rct_26
A pre-configured classic RCT dataset with 3 binary confounders.
n=10000, split=0.5, outcome is conversion (binary). Baseline control p=0.10
and treatment p=0.11 are set on the log-odds scale (X=0), so marginal rates
and ATE can differ once covariate effects are included. Includes a
deterministic user_id column.
Parameters
- seed (
int) – Random seed. - add_pre (
bool) – Whether to generate a pre-period covariate ('y_pre') and include prognostic signal from X. - beta_y (
array - like) – Linear coefficients for confounders in the outcome model. - outcome_depends_on_x (
bool) – Whether to add default effects for confounders if beta_y is None. - include_oracle (
bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc. - return_causal_data (
bool) – Whether to return a CausalData object. - n (
int) – Number of samples. - split (
float) – Proportion of samples assigned to the treatment group. - outcome_params (
dict) – Binary outcome parameters, e.g. {"p": {"A": 0.10, "B": 0.11}}. - add_ancillary (
bool) – Whether to add standard ancillary columns (age, platform, etc.). - deterministic_ids (
bool) – Whether to generate deterministic user IDs. - **kwargs – Additional arguments passed to
generate_classic_rct.
Returns
CausalData or DataFrame–
inference
Inference helpers for the classic RCT scenario.
Modules
- bootstrap_diff_in_means – Bootstrap difference-in-means inference.
- conversion_ztest – Two-proportion z-test
- ttest – T-test inference for Diff_in_Means model
Functions
- bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
bootstrap_diff_in_means
Bootstrap difference-in-means inference.
This module computes the ATE-style difference in means (treated - control) and provides:
- Two-sided p-value using a normal approximation with bootstrap standard error.
- Percentile confidence interval for the absolute difference.
- Relative difference (%) and corresponding CI relative to control mean.
Functions
- bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
bootstrap_diff_means
Bootstrap inference for difference in means between treated and control groups.
This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables. - alpha (
float) – The significance level for calculating confidence intervals (between 0 and 1). - n_simul (
int) – Number of bootstrap resamples. - batch_size (
int) – Number of bootstrap samples to process per batch. - seed (
int) – Random seed for reproducibility. - index_dtype (
numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.
Returns
Dict[str, Any]– A dictionary containing:- p_value: Two-sided p-value using normal approximation.
- absolute_difference: The absolute difference (treated - control).
- absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
- relative_difference: The relative difference (%) relative to control mean.
- relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).
Raises
ValueError– If inputs are invalid, treatment is not binary, or groups are empty.
bootstrap_diff_means
Bootstrap inference for difference in means between treated and control groups.
This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables. - alpha (
float) – The significance level for calculating confidence intervals (between 0 and 1). - n_simul (
int) – Number of bootstrap resamples. - batch_size (
int) – Number of bootstrap samples to process per batch. - seed (
int) – Random seed for reproducibility. - index_dtype (
numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.
Returns
Dict[str, Any]– A dictionary containing:- p_value: Two-sided p-value using normal approximation.
- absolute_difference: The absolute difference (treated - control).
- absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
- relative_difference: The relative difference (%) relative to control mean.
- relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).
Raises
ValueError– If inputs are invalid, treatment is not binary, or groups are empty.
conversion_ztest
Two-proportion z-test
Compares conversion rates between treated (D=1) and control (D=0) groups. Returns p-value, absolute/relative differences, and their confidence intervals
Functions
- conversion_ztest – Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).
conversion_ztest
Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables. - alpha (
float) – The significance level for calculating confidence intervals (between 0 and 1). - ci_method (
(newcombe, wald_unpooled, wald_pooled)) – Method for calculating the confidence interval for the absolute difference. "newcombe" is the most robust default for conversion rates. - se_for_test (
(pooled, unpooled)) – Method for calculating the standard error for the z-test p-value. "pooled" (score test) is generally preferred for testing equality of proportions.
Returns
Dict[str, Any]– A dictionary containing:- p_value: Two-sided p-value from the z-test
- absolute_difference: Difference in conversion rates (treated - control)
- absolute_ci: Tuple (lower, upper) for the absolute difference CI
- relative_difference: Percentage change relative to control rate
- relative_ci: Tuple (lower, upper) for the relative difference CI (delta method)
Raises
ValueError– If treatment/outcome are missing, treatment is not binary, outcome is not binary, groups are empty, or alpha is outside (0, 1).
ttest
T-test inference for Diff_in_Means model
Functions
- ttest – Perform a Welch two-sample t-test comparing outcomes between treated (D=1)
ttest
Perform a Welch two-sample t-test comparing outcomes between treated (D=1) and control (D=0) groups.
Returns
Dict[str, Any]– - p_value: Welch t-test p-value for H0: E[Y|D=1] - E[Y|D=0] = 0- absolute_difference: treatment_mean - control_mean
- absolute_ci: (lower, upper) CI for absolute_difference using Welch df
- relative_difference: signed percent change = 100 * (treatment_mean / control_mean - 1)
- relative_se: delta-method SE of relative_difference (percent scale)
- relative_ci: (lower, upper) CI for relative_difference using delta method (+ Satterthwaite df)
Notes
Delta method for relative percent change: r_hat = 100 * (Ybar1/Ybar0 - 1)
With independent groups and CLT: Var(Ybar1) ≈ s1^2/n1 Var(Ybar0) ≈ s0^2/n2 Cov(Ybar1, Ybar0) ≈ 0
Gradient of g(a,b)=a/b - 1 is (1/b, -a/b^2), so: Var(r_hat/100) ≈ (1/Ybar0)^2 * (s1^2/n1) + (Ybar1/Ybar0^2)^2 * (s0^2/n2)
CI uses t-critical with Satterthwaite df; falls back to z if df is invalid. If control_mean is near 0, relative stats are undefined/unstable and return inf/nan sentinels.
model
Classes
- DiffInMeans – Difference-in-means model for CausalData.
DiffInMeans
Difference-in-means model for CausalData. Wraps common RCT inference methods: t-test, bootstrap, and conversion z-test.
Functions
- estimate – Compute the treatment effect using the specified method.
- fit – Fit the model by storing the CausalData object.
data
estimate
Compute the treatment effect using the specified method.
Parameters
- method (
('ttest', 'bootstrap', 'conversion_ztest')) – The inference method to use. - "ttest": Standard independent two-sample t-test.
- "bootstrap": Bootstrap-based inference for difference in means.
- "conversion_ztest": Two-proportion z-test for binary outcomes.
- alpha (
float) – The significance level for calculating confidence intervals. - diagnostic_data (
bool) – Whether to include diagnostic data_contracts in the result. - **kwargs (
Any) – Additional arguments passed to the underlying inference function. - For "bootstrap": can pass
n_simul,batch_size,seed,index_dtype.
Returns
CausalEstimate– A results object containing effect estimates and inference.
fit
Fit the model by storing the CausalData object.
Parameters
- data (
CausalData) – The CausalData object containing treatment and outcome variables.
Returns
DiffInMeans– The fitted model.
rct_design
Design module for experimental rct_design utilities.
Classes
- SRMResult – Result of a Sample Ratio Mismatch (SRM) check.
Functions
- assign_variants_df – Deterministically assign variants for each row in df based on id_col.
- calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
- check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
SRMResult
Result of a Sample Ratio Mismatch (SRM) check.
Attributes
- chi2 (
float) – The calculated chi-square statistic. - p_value (
float) – The p-value of the test, rounded to 5 decimals. - expected (
dict[Hashable, float]) – Expected counts for each variant. - observed (
dict[Hashable, int]) – Observed counts for each variant. - alpha (
float) – Significance level used for the check. - is_srm (
bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise. - warning (
str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).
alpha
chi2
expected
is_srm
observed
p_value
warning
assign_variants_df
Deterministically assign variants for each row in df based on id_col.
Parameters
- df (
DataFrame) – Input DataFrame with an identifier column. - id_col (
str) – Column name in df containing entity identifiers (user_id, session_id, etc.). - experiment_id (
str) – Unique identifier for the experiment (versioned for reruns). - variants (
Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None. - salt (
str) – Secret string to de-correlate from other hash uses and make assignments non-gameable. - layer_id (
str) – Identifier for mutual exclusivity layer or surface. In this case work like another random - variant_col (
str) – Name of output column to store assigned variant labels.
Returns
DataFrame– A copy of df with an extra columnvariant_col. Entities outside experiment coverage will have None in the variant column.
calculate_mde
Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
Parameters
- sample_size (
int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter. - baseline_rate (
float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts. - variance (
float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance). - alpha (
float) – Significance level (Type I error rate). - power (
float) – Statistical power (1 - Type II error rate). - data_type (
str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts. - ratio (
float) – Ratio of the sample allocated to the control group if sample_size is a single integer.
Returns
Dict[str, Any]– A dictionary containing:- 'mde': The minimum detectable effect (absolute)
- 'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
- 'parameters': The parameters used for the calculation
Examples:
Notes
For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))
For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))
where:
- z_α/2 is the critical value for significance level α
- z_β is the critical value for power
- p1 and p2 are the conversion rates in the control and treatment groups
- σ1² and σ2² are the variances in the control and treatment groups
- n1 and n2 are the sample sizes in the control and treatment groups
check_srm
Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
Parameters
- assignments (
Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as{variant: observed_count}with non-negative integer counts. - target_allocation (
dict[Hashable, Number]) – Mapping{variant: p}describing intended allocation as probabilities. - alpha (
float) – Significance level. Use strict values like 1e-3 or 1e-4 in production. - min_expected (
float) – If any expected count < min_expected, a warning is attached. - strict_variants (
bool) – - True: fail if observed variants differ from target keys. - False: drop unknown variants and test only on declared ones.
Returns
SRMResult– The result of the SRM check.
Raises
ValueError– If inputs are invalid or empty.ImportError– If scipy is required but not installed.
Notes
- Target allocation probabilities must sum to 1 within numerical tolerance.
is_srmis computed using the unrounded p-value; the returnedp_valueis rounded to 5 decimals.- Missing assignments are dropped and reported via
warning. - Requires SciPy for p-value computation.
Examples:
ttest
T-test inference for Diff_in_Means model
Functions
- ttest – Perform a Welch two-sample t-test comparing outcomes between treated (D=1)
ttest
Perform a Welch two-sample t-test comparing outcomes between treated (D=1) and control (D=0) groups.
Returns
Dict[str, Any]– - p_value: Welch t-test p-value for H0: E[Y|D=1] - E[Y|D=0] = 0- absolute_difference: treatment_mean - control_mean
- absolute_ci: (lower, upper) CI for absolute_difference using Welch df
- relative_difference: signed percent change = 100 * (treatment_mean / control_mean - 1)
- relative_se: delta-method SE of relative_difference (percent scale)
- relative_ci: (lower, upper) CI for relative_difference using delta method (+ Satterthwaite df)
Notes
Delta method for relative percent change: r_hat = 100 * (Ybar1/Ybar0 - 1)
With independent groups and CLT: Var(Ybar1) ≈ s1^2/n1 Var(Ybar0) ≈ s0^2/n2 Cov(Ybar1, Ybar0) ≈ 0
Gradient of g(a,b)=a/b - 1 is (1/b, -a/b^2), so: Var(r_hat/100) ≈ (1/Ybar0)^2 * (s1^2/n1) + (Ybar1/Ybar0^2)^2 * (s0^2/n2)
CI uses t-critical with Satterthwaite df; falls back to z if df is invalid. If control_mean is near 0, relative stats are undefined/unstable and return inf/nan sentinels.