Causalis

`classic_rct`

Modules

conversion_ztest – Two-proportion z-test
dgp –
inference – Inference helpers for the classic RCT scenario.
model –
rct_design – Design module for experimental rct_design utilities.
ttest – T-test inference for Diff_in_Means model

Classes

DiffInMeans – Difference-in-means model for CausalData.
SRMResult – Result of a Sample Ratio Mismatch (SRM) check.

Functions

bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

`DiffInMeans`

Difference-in-means model for CausalData. Wraps common RCT inference methods: t-test, bootstrap, and conversion z-test.

Functions

estimate – Compute the treatment effect using the specified method.
fit – Fit the model by storing the CausalData object.

`data`

`estimate`

Compute the treatment effect using the specified method.

Parameters

method (('ttest', 'bootstrap', 'conversion_ztest')) – The inference method to use.
"ttest": Standard independent two-sample t-test.
"bootstrap": Bootstrap-based inference for difference in means.
"conversion_ztest": Two-proportion z-test for binary outcomes.
alpha (float) – The significance level for calculating confidence intervals.
diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.
**kwargs (Any) – Additional arguments passed to the underlying inference function.
For "bootstrap": can pass n_simul, batch_size, seed, index_dtype.

Returns

CausalEstimate – A results object containing effect estimates and inference.

`fit`

Fit the model by storing the CausalData object.

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.

Returns

DiffInMeans – The fitted model.

`SRMResult`

Result of a Sample Ratio Mismatch (SRM) check.

Attributes

chi2 (float) – The calculated chi-square statistic.
p_value (float) – The p-value of the test, rounded to 5 decimals.
expected (dict[Hashable, float]) – Expected counts for each variant.
observed (dict[Hashable, int]) – Observed counts for each variant.
alpha (float) – Significance level used for the check.
is_srm (bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise.
warning (str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).

`alpha`

`chi2`

`expected`

`is_srm`

`observed`

`p_value`

`warning`

`bootstrap_diff_means`

Bootstrap inference for difference in means between treated and control groups.

This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.
alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
n_simul (int) – Number of bootstrap resamples.
batch_size (int) – Number of bootstrap samples to process per batch.
seed (int) – Random seed for reproducibility.
index_dtype (numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.

Returns

Dict[str, Any] – A dictionary containing:
p_value: Two-sided p-value using normal approximation.
absolute_difference: The absolute difference (treated - control).
absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
relative_difference: The relative difference (%) relative to control mean.
relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).

Raises

ValueError – If inputs are invalid, treatment is not binary, or groups are empty.

`check_srm`

Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

Parameters

assignments (Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as {variant: observed_count} with non-negative integer counts.
target_allocation (dict[Hashable, Number]) – Mapping {variant: p} describing intended allocation as probabilities.
alpha (float) – Significance level. Use strict values like 1e-3 or 1e-4 in production.
min_expected (float) – If any expected count < min_expected, a warning is attached.
strict_variants (bool) – - True: fail if observed variants differ from target keys.
False: drop unknown variants and test only on declared ones.

Returns

SRMResult – The result of the SRM check.

Raises

ValueError – If inputs are invalid or empty.
ImportError – If scipy is required but not installed.

Notes

Target allocation probabilities must sum to 1 within numerical tolerance.
is_srm is computed using the unrounded p-value; the returned p_value is rounded to 5 decimals.
Missing assignments are dropped and reported via warning.
Requires SciPy for p-value computation.

Examples:

code.pycon

`conversion_ztest`

Two-proportion z-test

Compares conversion rates between treated (D=1) and control (D=0) groups. Returns p-value, absolute/relative differences, and their confidence intervals

Functions

conversion_ztest – Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).

`conversion_ztest`

Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.
alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
ci_method ((newcombe, wald_unpooled, wald_pooled)) – Method for calculating the confidence interval for the absolute difference. "newcombe" is the most robust default for conversion rates.
se_for_test ((pooled, unpooled)) – Method for calculating the standard error for the z-test p-value. "pooled" (score test) is generally preferred for testing equality of proportions.

Returns

Dict[str, Any] – A dictionary containing:
p_value: Two-sided p-value from the z-test
absolute_difference: Difference in conversion rates (treated - control)
absolute_ci: Tuple (lower, upper) for the absolute difference CI
relative_difference: Percentage change relative to control rate
relative_ci: Tuple (lower, upper) for the relative difference CI (delta method)

Raises

ValueError – If treatment/outcome are missing, treatment is not binary, outcome is not binary, groups are empty, or alpha is outside (0, 1).

`dgp`

Functions

classic_rct_gamma_26 – A pre-configured classic RCT dataset with a gamma outcome.
generate_classic_rct_26 – A pre-configured classic RCT dataset with 3 binary confounders.

`classic_rct_gamma_26`

A pre-configured classic RCT dataset with a gamma outcome. n=10000, split=0.5, mean uplift ~10%. Includes deterministic user_id and ancillary columns.

Parameters

seed (int) – Random seed.
add_pre (bool) – Whether to generate a pre-period covariate ('y_pre').
beta_y (array - like) – Linear coefficients for confounders in the outcome model.
outcome_depends_on_x (bool) – Whether to add default effects for confounders if beta_y is None.
include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc.
return_causal_data (bool) – Whether to return a CausalData object.
n (int) – Number of samples.
split (float) – Proportion of samples assigned to the treatment group.
outcome_params (dict) – Gamma outcome parameters, e.g. {"shape": 2.0, "scale": {"A": 15.0, "B": 16.5}}.
add_ancillary (bool) – Whether to add standard ancillary columns (age, platform, etc.).
deterministic_ids (bool) – Whether to generate deterministic user IDs.
**kwargs – Additional arguments passed to classic_rct_gamma.

Returns

CausalData or DataFrame –

`generate_classic_rct_26`

A pre-configured classic RCT dataset with 3 binary confounders. n=10000, split=0.5, outcome is conversion (binary). Baseline control p=0.10 and treatment p=0.11 are set on the log-odds scale (X=0), so marginal rates and ATE can differ once covariate effects are included. Includes a deterministic user_id column.

Parameters

seed (int) – Random seed.
add_pre (bool) – Whether to generate a pre-period covariate ('y_pre') and include prognostic signal from X.
beta_y (array - like) – Linear coefficients for confounders in the outcome model.
outcome_depends_on_x (bool) – Whether to add default effects for confounders if beta_y is None.
include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc.
return_causal_data (bool) – Whether to return a CausalData object.
n (int) – Number of samples.
split (float) – Proportion of samples assigned to the treatment group.
outcome_params (dict) – Binary outcome parameters, e.g. {"p": {"A": 0.10, "B": 0.11}}.
add_ancillary (bool) – Whether to add standard ancillary columns (age, platform, etc.).
deterministic_ids (bool) – Whether to generate deterministic user IDs.
**kwargs – Additional arguments passed to generate_classic_rct.

Returns

CausalData or DataFrame –

`inference`

Inference helpers for the classic RCT scenario.

Modules

bootstrap_diff_in_means – Bootstrap difference-in-means inference.
conversion_ztest – Two-proportion z-test
ttest – T-test inference for Diff_in_Means model

Functions

bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.

`bootstrap_diff_in_means`

Bootstrap difference-in-means inference.

This module computes the ATE-style difference in means (treated - control) and provides:

Two-sided p-value using a normal approximation with bootstrap standard error.
Percentile confidence interval for the absolute difference.
Relative difference (%) and corresponding CI relative to control mean.

Functions

bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.

`bootstrap_diff_means`

Bootstrap inference for difference in means between treated and control groups.

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.
alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
n_simul (int) – Number of bootstrap resamples.
batch_size (int) – Number of bootstrap samples to process per batch.
seed (int) – Random seed for reproducibility.
index_dtype (numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.

Returns

Dict[str, Any] – A dictionary containing:
p_value: Two-sided p-value using normal approximation.
absolute_difference: The absolute difference (treated - control).
absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
relative_difference: The relative difference (%) relative to control mean.
relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).

Raises

ValueError – If inputs are invalid, treatment is not binary, or groups are empty.

`bootstrap_diff_means`

Bootstrap inference for difference in means between treated and control groups.

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.
alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
n_simul (int) – Number of bootstrap resamples.
batch_size (int) – Number of bootstrap samples to process per batch.
seed (int) – Random seed for reproducibility.
index_dtype (numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.

Returns

Dict[str, Any] – A dictionary containing:
p_value: Two-sided p-value using normal approximation.
absolute_difference: The absolute difference (treated - control).
absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
relative_difference: The relative difference (%) relative to control mean.
relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).

Raises

ValueError – If inputs are invalid, treatment is not binary, or groups are empty.

`conversion_ztest`

Two-proportion z-test

Compares conversion rates between treated (D=1) and control (D=0) groups. Returns p-value, absolute/relative differences, and their confidence intervals

Functions

conversion_ztest – Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).

`conversion_ztest`

Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.
alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
ci_method ((newcombe, wald_unpooled, wald_pooled)) – Method for calculating the confidence interval for the absolute difference. "newcombe" is the most robust default for conversion rates.
se_for_test ((pooled, unpooled)) – Method for calculating the standard error for the z-test p-value. "pooled" (score test) is generally preferred for testing equality of proportions.

Returns

Dict[str, Any] – A dictionary containing:
p_value: Two-sided p-value from the z-test
absolute_difference: Difference in conversion rates (treated - control)
absolute_ci: Tuple (lower, upper) for the absolute difference CI
relative_difference: Percentage change relative to control rate
relative_ci: Tuple (lower, upper) for the relative difference CI (delta method)

Raises

ValueError – If treatment/outcome are missing, treatment is not binary, outcome is not binary, groups are empty, or alpha is outside (0, 1).

`ttest`

T-test inference for Diff_in_Means model

Functions

ttest – Perform a Welch two-sample t-test comparing outcomes between treated (D=1)

`ttest`

Perform a Welch two-sample t-test comparing outcomes between treated (D=1) and control (D=0) groups.

Returns

Dict[str, Any] – - p_value: Welch t-test p-value for H0: E[Y|D=1] - E[Y|D=0] = 0
absolute_difference: treatment_mean - control_mean
absolute_ci: (lower, upper) CI for absolute_difference using Welch df
relative_difference: signed percent change = 100 * (treatment_mean / control_mean - 1)
relative_se: delta-method SE of relative_difference (percent scale)
relative_ci: (lower, upper) CI for relative_difference using delta method (+ Satterthwaite df)

Notes

Delta method for relative percent change: r_hat = 100 * (Ybar1/Ybar0 - 1)

With independent groups and CLT: Var(Ybar1) ≈ s1^2/n1 Var(Ybar0) ≈ s0^2/n2 Cov(Ybar1, Ybar0) ≈ 0

Gradient of g(a,b)=a/b - 1 is (1/b, -a/b^2), so: Var(r_hat/100) ≈ (1/Ybar0)^2 * (s1^2/n1) + (Ybar1/Ybar0^2)^2 * (s0^2/n2)

CI uses t-critical with Satterthwaite df; falls back to z if df is invalid. If control_mean is near 0, relative stats are undefined/unstable and return inf/nan sentinels.

`model`

Classes

DiffInMeans – Difference-in-means model for CausalData.

`DiffInMeans`

Difference-in-means model for CausalData. Wraps common RCT inference methods: t-test, bootstrap, and conversion z-test.

Functions

estimate – Compute the treatment effect using the specified method.
fit – Fit the model by storing the CausalData object.

`data`

`estimate`

Compute the treatment effect using the specified method.

Parameters

method (('ttest', 'bootstrap', 'conversion_ztest')) – The inference method to use.
"ttest": Standard independent two-sample t-test.
"bootstrap": Bootstrap-based inference for difference in means.
"conversion_ztest": Two-proportion z-test for binary outcomes.
alpha (float) – The significance level for calculating confidence intervals.
diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.
**kwargs (Any) – Additional arguments passed to the underlying inference function.
For "bootstrap": can pass n_simul, batch_size, seed, index_dtype.

Returns

CausalEstimate – A results object containing effect estimates and inference.

`fit`

Fit the model by storing the CausalData object.

Parameters

data (CausalData) – The CausalData object containing treatment and outcome variables.

Returns

DiffInMeans – The fitted model.

`rct_design`

Design module for experimental rct_design utilities.

Classes

SRMResult – Result of a Sample Ratio Mismatch (SRM) check.

Functions

assign_variants_df – Deterministically assign variants for each row in df based on id_col.
calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

`SRMResult`

Result of a Sample Ratio Mismatch (SRM) check.

Attributes

chi2 (float) – The calculated chi-square statistic.
p_value (float) – The p-value of the test, rounded to 5 decimals.
expected (dict[Hashable, float]) – Expected counts for each variant.
observed (dict[Hashable, int]) – Observed counts for each variant.
alpha (float) – Significance level used for the check.
is_srm (bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise.
warning (str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).

`alpha`

`chi2`

`expected`

`is_srm`

`observed`

`p_value`

`warning`

`assign_variants_df`

Deterministically assign variants for each row in df based on id_col.

Parameters

df (DataFrame) – Input DataFrame with an identifier column.
id_col (str) – Column name in df containing entity identifiers (user_id, session_id, etc.).
experiment_id (str) – Unique identifier for the experiment (versioned for reruns).
variants (Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None.
salt (str) – Secret string to de-correlate from other hash uses and make assignments non-gameable.
layer_id (str) – Identifier for mutual exclusivity layer or surface. In this case work like another random
variant_col (str) – Name of output column to store assigned variant labels.

Returns

DataFrame – A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.

`calculate_mde`

Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

Parameters

sample_size (int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter.
baseline_rate (float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts.
variance (float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance).
alpha (float) – Significance level (Type I error rate).
power (float) – Statistical power (1 - Type II error rate).
data_type (str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts.
ratio (float) – Ratio of the sample allocated to the control group if sample_size is a single integer.

Returns

Dict[str, Any] – A dictionary containing:
'mde': The minimum detectable effect (absolute)
'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
'parameters': The parameters used for the calculation

Examples:

code.pycon

Notes

For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))

where:

z_α/2 is the critical value for significance level α
z_β is the critical value for power
p1 and p2 are the conversion rates in the control and treatment groups
σ1² and σ2² are the variances in the control and treatment groups
n1 and n2 are the sample sizes in the control and treatment groups

`check_srm`

Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

Parameters

assignments (Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as {variant: observed_count} with non-negative integer counts.
target_allocation (dict[Hashable, Number]) – Mapping {variant: p} describing intended allocation as probabilities.
alpha (float) – Significance level. Use strict values like 1e-3 or 1e-4 in production.
min_expected (float) – If any expected count < min_expected, a warning is attached.
strict_variants (bool) – - True: fail if observed variants differ from target keys.
False: drop unknown variants and test only on declared ones.

Returns

SRMResult – The result of the SRM check.

Raises

ValueError – If inputs are invalid or empty.
ImportError – If scipy is required but not installed.

Notes

Target allocation probabilities must sum to 1 within numerical tolerance.
is_srm is computed using the unrounded p-value; the returned p_value is rounded to 5 decimals.
Missing assignments are dropped and reported via warning.
Requires SciPy for p-value computation.

Examples:

code.pycon

`ttest`

T-test inference for Diff_in_Means model

Functions

ttest – Perform a Welch two-sample t-test comparing outcomes between treated (D=1)

`ttest`

Perform a Welch two-sample t-test comparing outcomes between treated (D=1) and control (D=0) groups.

Returns

Dict[str, Any] – - p_value: Welch t-test p-value for H0: E[Y|D=1] - E[Y|D=0] = 0
absolute_difference: treatment_mean - control_mean
absolute_ci: (lower, upper) CI for absolute_difference using Welch df
relative_difference: signed percent change = 100 * (treatment_mean / control_mean - 1)
relative_se: delta-method SE of relative_difference (percent scale)
relative_ci: (lower, upper) CI for relative_difference using delta method (+ Satterthwaite df)

Notes

Delta method for relative percent change: r_hat = 100 * (Ybar1/Ybar0 - 1)

With independent groups and CLT: Var(Ybar1) ≈ s1^2/n1 Var(Ybar0) ≈ s0^2/n2 Cov(Ybar1, Ybar0) ≈ 0

Gradient of g(a,b)=a/b - 1 is (1/b, -a/b^2), so: Var(r_hat/100) ≈ (1/Ybar0)^2 * (s1^2/n1) + (Ybar1/Ybar0^2)^2 * (s0^2/n2)

CI uses t-critical with Satterthwaite df; falls back to z if df is invalid. If control_mean is near 0, relative stats are undefined/unstable and return inf/nan sentinels.