API ReferenceEntry

classic_rct

classic_rct

Reference details for classic_rct in causalis.scenarios.

classic_rct

Modules
  • conversion_ztest – Two-proportion z-test
  • dgp
  • inference – Inference helpers for the classic RCT scenario.
  • model
  • rct_design – Design module for experimental rct_design utilities.
  • ttest – T-test inference for Diff_in_Means model
Classes
  • DiffInMeans – Difference-in-means model for CausalData.
  • SRMResult – Result of a Sample Ratio Mismatch (SRM) check.
Functions
  • bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
  • check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
DiffInMeans

Difference-in-means model for CausalData. Wraps common RCT inference methods: t-test, bootstrap, and conversion z-test.

Functions
  • estimate – Compute the treatment effect using the specified method.
  • fit – Fit the model by storing the CausalData object.
data
estimate

Compute the treatment effect using the specified method.

Parameters
  • method (('ttest', 'bootstrap', 'conversion_ztest')) – The inference method to use.
  • "ttest": Standard independent two-sample t-test.
  • "bootstrap": Bootstrap-based inference for difference in means.
  • "conversion_ztest": Two-proportion z-test for binary outcomes.
  • alpha (float) – The significance level for calculating confidence intervals.
  • diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.
  • **kwargs (Any) – Additional arguments passed to the underlying inference function.
  • For "bootstrap": can pass n_simul, batch_size, seed, index_dtype.
Returns
  • CausalEstimate – A results object containing effect estimates and inference.
fit

Fit the model by storing the CausalData object.

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
Returns
SRMResult

Result of a Sample Ratio Mismatch (SRM) check.

Attributes
  • chi2 (float) – The calculated chi-square statistic.
  • p_value (float) – The p-value of the test, rounded to 5 decimals.
  • expected (dict[Hashable, float]) – Expected counts for each variant.
  • observed (dict[Hashable, int]) – Observed counts for each variant.
  • alpha (float) – Significance level used for the check.
  • is_srm (bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise.
  • warning (str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).
alpha
chi2
expected
is_srm
observed
p_value
warning
bootstrap_diff_means

Bootstrap inference for difference in means between treated and control groups.

This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
  • alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
  • n_simul (int) – Number of bootstrap resamples.
  • batch_size (int) – Number of bootstrap samples to process per batch.
  • seed (int) – Random seed for reproducibility.
  • index_dtype (numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.
Returns
  • Dict[str, Any] – A dictionary containing:
  • p_value: Two-sided p-value using normal approximation.
  • absolute_difference: The absolute difference (treated - control).
  • absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
  • relative_difference: The relative difference (%) relative to control mean.
  • relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).
Raises
  • ValueError – If inputs are invalid, treatment is not binary, or groups are empty.
check_srm

Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

Parameters
  • assignments (Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as &#123;variant: observed_count&#125; with non-negative integer counts.
  • target_allocation (dict[Hashable, Number]) – Mapping &#123;variant: p&#125; describing intended allocation as probabilities.
  • alpha (float) – Significance level. Use strict values like 1e-3 or 1e-4 in production.
  • min_expected (float) – If any expected count < min_expected, a warning is attached.
  • strict_variants (bool) – - True: fail if observed variants differ from target keys.
  • False: drop unknown variants and test only on declared ones.
Returns
Raises
Notes
  • Target allocation probabilities must sum to 1 within numerical tolerance.
  • is_srm is computed using the unrounded p-value; the returned p_value is rounded to 5 decimals.
  • Missing assignments are dropped and reported via warning.
  • Requires SciPy for p-value computation.

Examples:

code.pycon
code.pycon
conversion_ztest

Two-proportion z-test

Compares conversion rates between treated (D=1) and control (D=0) groups. Returns p-value, absolute/relative differences, and their confidence intervals

Functions
  • conversion_ztest – Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).
conversion_ztest

Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
  • alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
  • ci_method ((newcombe, wald_unpooled, wald_pooled)) – Method for calculating the confidence interval for the absolute difference. "newcombe" is the most robust default for conversion rates.
  • se_for_test ((pooled, unpooled)) – Method for calculating the standard error for the z-test p-value. "pooled" (score test) is generally preferred for testing equality of proportions.
Returns
  • Dict[str, Any] – A dictionary containing:
  • p_value: Two-sided p-value from the z-test
  • absolute_difference: Difference in conversion rates (treated - control)
  • absolute_ci: Tuple (lower, upper) for the absolute difference CI
  • relative_difference: Percentage change relative to control rate
  • relative_ci: Tuple (lower, upper) for the relative difference CI (delta method)
Raises
  • ValueError – If treatment/outcome are missing, treatment is not binary, outcome is not binary, groups are empty, or alpha is outside (0, 1).
dgp
Functions
classic_rct_gamma_26

A pre-configured classic RCT dataset with a gamma outcome. n=10000, split=0.5, mean uplift ~10%. Includes deterministic user_id and ancillary columns.

Parameters
  • seed (int) – Random seed.
  • add_pre (bool) – Whether to generate a pre-period covariate ('y_pre').
  • beta_y (array - like) – Linear coefficients for confounders in the outcome model.
  • outcome_depends_on_x (bool) – Whether to add default effects for confounders if beta_y is None.
  • include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc.
  • return_causal_data (bool) – Whether to return a CausalData object.
  • n (int) – Number of samples.
  • split (float) – Proportion of samples assigned to the treatment group.
  • outcome_params (dict) – Gamma outcome parameters, e.g. {"shape": 2.0, "scale": {"A": 15.0, "B": 16.5}}.
  • add_ancillary (bool) – Whether to add standard ancillary columns (age, platform, etc.).
  • deterministic_ids (bool) – Whether to generate deterministic user IDs.
  • **kwargs – Additional arguments passed to classic_rct_gamma.
Returns
generate_classic_rct_26

A pre-configured classic RCT dataset with 3 binary confounders. n=10000, split=0.5, outcome is conversion (binary). Baseline control p=0.10 and treatment p=0.11 are set on the log-odds scale (X=0), so marginal rates and ATE can differ once covariate effects are included. Includes a deterministic user_id column.

Parameters
  • seed (int) – Random seed.
  • add_pre (bool) – Whether to generate a pre-period covariate ('y_pre') and include prognostic signal from X.
  • beta_y (array - like) – Linear coefficients for confounders in the outcome model.
  • outcome_depends_on_x (bool) – Whether to add default effects for confounders if beta_y is None.
  • include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc.
  • return_causal_data (bool) – Whether to return a CausalData object.
  • n (int) – Number of samples.
  • split (float) – Proportion of samples assigned to the treatment group.
  • outcome_params (dict) – Binary outcome parameters, e.g. {"p": {"A": 0.10, "B": 0.11}}.
  • add_ancillary (bool) – Whether to add standard ancillary columns (age, platform, etc.).
  • deterministic_ids (bool) – Whether to generate deterministic user IDs.
  • **kwargs – Additional arguments passed to generate_classic_rct.
Returns
inference

Inference helpers for the classic RCT scenario.

Modules
Functions
  • bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
bootstrap_diff_in_means

Bootstrap difference-in-means inference.

This module computes the ATE-style difference in means (treated - control) and provides:

  • Two-sided p-value using a normal approximation with bootstrap standard error.
  • Percentile confidence interval for the absolute difference.
  • Relative difference (%) and corresponding CI relative to control mean.
Functions
  • bootstrap_diff_means – Bootstrap inference for difference in means between treated and control groups.
bootstrap_diff_means

Bootstrap inference for difference in means between treated and control groups.

This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
  • alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
  • n_simul (int) – Number of bootstrap resamples.
  • batch_size (int) – Number of bootstrap samples to process per batch.
  • seed (int) – Random seed for reproducibility.
  • index_dtype (numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.
Returns
  • Dict[str, Any] – A dictionary containing:
  • p_value: Two-sided p-value using normal approximation.
  • absolute_difference: The absolute difference (treated - control).
  • absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
  • relative_difference: The relative difference (%) relative to control mean.
  • relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).
Raises
  • ValueError – If inputs are invalid, treatment is not binary, or groups are empty.
bootstrap_diff_means

Bootstrap inference for difference in means between treated and control groups.

This function computes the ATE-style difference in means (treated - control) and provides a two-sided p-value using a normal approximation with bootstrap standard error, a percentile confidence interval for the absolute difference, and relative difference with its corresponding confidence interval.

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
  • alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
  • n_simul (int) – Number of bootstrap resamples.
  • batch_size (int) – Number of bootstrap samples to process per batch.
  • seed (int) – Random seed for reproducibility.
  • index_dtype (numpy dtype) – Integer dtype for bootstrap indices to reduce memory usage.
Returns
  • Dict[str, Any] – A dictionary containing:
  • p_value: Two-sided p-value using normal approximation.
  • absolute_difference: The absolute difference (treated - control).
  • absolute_ci: Tuple of (lower, upper) bounds for the absolute difference CI.
  • relative_difference: The relative difference (%) relative to control mean.
  • relative_ci: Tuple of (lower, upper) bounds for the relative difference CI (delta method).
Raises
  • ValueError – If inputs are invalid, treatment is not binary, or groups are empty.
conversion_ztest

Two-proportion z-test

Compares conversion rates between treated (D=1) and control (D=0) groups. Returns p-value, absolute/relative differences, and their confidence intervals

Functions
  • conversion_ztest – Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).
conversion_ztest

Perform a two-proportion z-test on a CausalData object with a binary outcome (conversion).

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
  • alpha (float) – The significance level for calculating confidence intervals (between 0 and 1).
  • ci_method ((newcombe, wald_unpooled, wald_pooled)) – Method for calculating the confidence interval for the absolute difference. "newcombe" is the most robust default for conversion rates.
  • se_for_test ((pooled, unpooled)) – Method for calculating the standard error for the z-test p-value. "pooled" (score test) is generally preferred for testing equality of proportions.
Returns
  • Dict[str, Any] – A dictionary containing:
  • p_value: Two-sided p-value from the z-test
  • absolute_difference: Difference in conversion rates (treated - control)
  • absolute_ci: Tuple (lower, upper) for the absolute difference CI
  • relative_difference: Percentage change relative to control rate
  • relative_ci: Tuple (lower, upper) for the relative difference CI (delta method)
Raises
  • ValueError – If treatment/outcome are missing, treatment is not binary, outcome is not binary, groups are empty, or alpha is outside (0, 1).
ttest

T-test inference for Diff_in_Means model

Functions
  • ttest – Perform a Welch two-sample t-test comparing outcomes between treated (D=1)
ttest

Perform a Welch two-sample t-test comparing outcomes between treated (D=1) and control (D=0) groups.

Returns
  • Dict[str, Any] – - p_value: Welch t-test p-value for H0: E[Y|D=1] - E[Y|D=0] = 0
  • absolute_difference: treatment_mean - control_mean
  • absolute_ci: (lower, upper) CI for absolute_difference using Welch df
  • relative_difference: signed percent change = 100 * (treatment_mean / control_mean - 1)
  • relative_se: delta-method SE of relative_difference (percent scale)
  • relative_ci: (lower, upper) CI for relative_difference using delta method (+ Satterthwaite df)
Notes

Delta method for relative percent change: r_hat = 100 * (Ybar1/Ybar0 - 1)

With independent groups and CLT: Var(Ybar1) ≈ s1^2/n1 Var(Ybar0) ≈ s0^2/n2 Cov(Ybar1, Ybar0) ≈ 0

Gradient of g(a,b)=a/b - 1 is (1/b, -a/b^2), so: Var(r_hat/100) ≈ (1/Ybar0)^2 * (s1^2/n1) + (Ybar1/Ybar0^2)^2 * (s0^2/n2)

CI uses t-critical with Satterthwaite df; falls back to z if df is invalid. If control_mean is near 0, relative stats are undefined/unstable and return inf/nan sentinels.

model
Classes
  • DiffInMeans – Difference-in-means model for CausalData.
DiffInMeans

Difference-in-means model for CausalData. Wraps common RCT inference methods: t-test, bootstrap, and conversion z-test.

Functions
  • estimate – Compute the treatment effect using the specified method.
  • fit – Fit the model by storing the CausalData object.
data
estimate

Compute the treatment effect using the specified method.

Parameters
  • method (('ttest', 'bootstrap', 'conversion_ztest')) – The inference method to use.
  • "ttest": Standard independent two-sample t-test.
  • "bootstrap": Bootstrap-based inference for difference in means.
  • "conversion_ztest": Two-proportion z-test for binary outcomes.
  • alpha (float) – The significance level for calculating confidence intervals.
  • diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.
  • **kwargs (Any) – Additional arguments passed to the underlying inference function.
  • For "bootstrap": can pass n_simul, batch_size, seed, index_dtype.
Returns
  • CausalEstimate – A results object containing effect estimates and inference.
fit

Fit the model by storing the CausalData object.

Parameters
  • data (CausalData) – The CausalData object containing treatment and outcome variables.
Returns
rct_design

Design module for experimental rct_design utilities.

Classes
  • SRMResult – Result of a Sample Ratio Mismatch (SRM) check.
Functions
  • assign_variants_df – Deterministically assign variants for each row in df based on id_col.
  • calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
  • check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
SRMResult

Result of a Sample Ratio Mismatch (SRM) check.

Attributes
  • chi2 (float) – The calculated chi-square statistic.
  • p_value (float) – The p-value of the test, rounded to 5 decimals.
  • expected (dict[Hashable, float]) – Expected counts for each variant.
  • observed (dict[Hashable, int]) – Observed counts for each variant.
  • alpha (float) – Significance level used for the check.
  • is_srm (bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise.
  • warning (str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).
alpha
chi2
expected
is_srm
observed
p_value
warning
assign_variants_df

Deterministically assign variants for each row in df based on id_col.

Parameters
  • df (DataFrame) – Input DataFrame with an identifier column.
  • id_col (str) – Column name in df containing entity identifiers (user_id, session_id, etc.).
  • experiment_id (str) – Unique identifier for the experiment (versioned for reruns).
  • variants (Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None.
  • salt (str) – Secret string to de-correlate from other hash uses and make assignments non-gameable.
  • layer_id (str) – Identifier for mutual exclusivity layer or surface. In this case work like another random
  • variant_col (str) – Name of output column to store assigned variant labels.
Returns
  • DataFrame – A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.
calculate_mde

Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

Parameters
  • sample_size (int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter.
  • baseline_rate (float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts.
  • variance (float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance).
  • alpha (float) – Significance level (Type I error rate).
  • power (float) – Statistical power (1 - Type II error rate).
  • data_type (str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts.
  • ratio (float) – Ratio of the sample allocated to the control group if sample_size is a single integer.
Returns
  • Dict[str, Any] – A dictionary containing:
  • 'mde': The minimum detectable effect (absolute)
  • 'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
  • 'parameters': The parameters used for the calculation

Examples:

code.pycon
code.pycon
Notes

For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))

where:

  • z_α/2 is the critical value for significance level α
  • z_β is the critical value for power
  • p1 and p2 are the conversion rates in the control and treatment groups
  • σ1² and σ2² are the variances in the control and treatment groups
  • n1 and n2 are the sample sizes in the control and treatment groups
check_srm

Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

Parameters
  • assignments (Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as &#123;variant: observed_count&#125; with non-negative integer counts.
  • target_allocation (dict[Hashable, Number]) – Mapping &#123;variant: p&#125; describing intended allocation as probabilities.
  • alpha (float) – Significance level. Use strict values like 1e-3 or 1e-4 in production.
  • min_expected (float) – If any expected count < min_expected, a warning is attached.
  • strict_variants (bool) – - True: fail if observed variants differ from target keys.
  • False: drop unknown variants and test only on declared ones.
Returns
Raises
Notes
  • Target allocation probabilities must sum to 1 within numerical tolerance.
  • is_srm is computed using the unrounded p-value; the returned p_value is rounded to 5 decimals.
  • Missing assignments are dropped and reported via warning.
  • Requires SciPy for p-value computation.

Examples:

code.pycon
code.pycon
ttest

T-test inference for Diff_in_Means model

Functions
  • ttest – Perform a Welch two-sample t-test comparing outcomes between treated (D=1)
ttest

Perform a Welch two-sample t-test comparing outcomes between treated (D=1) and control (D=0) groups.

Returns
  • Dict[str, Any] – - p_value: Welch t-test p-value for H0: E[Y|D=1] - E[Y|D=0] = 0
  • absolute_difference: treatment_mean - control_mean
  • absolute_ci: (lower, upper) CI for absolute_difference using Welch df
  • relative_difference: signed percent change = 100 * (treatment_mean / control_mean - 1)
  • relative_se: delta-method SE of relative_difference (percent scale)
  • relative_ci: (lower, upper) CI for relative_difference using delta method (+ Satterthwaite df)
Notes

Delta method for relative percent change: r_hat = 100 * (Ybar1/Ybar0 - 1)

With independent groups and CLT: Var(Ybar1) ≈ s1^2/n1 Var(Ybar0) ≈ s0^2/n2 Cov(Ybar1, Ybar0) ≈ 0

Gradient of g(a,b)=a/b - 1 is (1/b, -a/b^2), so: Var(r_hat/100) ≈ (1/Ybar0)^2 * (s1^2/n1) + (Ybar1/Ybar0^2)^2 * (s0^2/n2)

CI uses t-critical with Satterthwaite df; falls back to z if df is invalid. If control_mean is near 0, relative stats are undefined/unstable and return inf/nan sentinels.