Causalis

`rct_design`

Design module for experimental rct_design utilities.

Modules

mde – Utility functions for calculating Minimum Detectable Effect (MDE) for experimental rct_design.
split – Split (assignment) utilities for randomized controlled experiments.

Classes

SRMResult – Result of a Sample Ratio Mismatch (SRM) check.

Functions

assign_variants_df – Deterministically assign variants for each row in df based on id_col.
calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

`SRMResult`

Result of a Sample Ratio Mismatch (SRM) check.

Attributes

chi2 (float) – The calculated chi-square statistic.
p_value (float) – The p-value of the test, rounded to 5 decimals.
expected (dict[Hashable, float]) – Expected counts for each variant.
observed (dict[Hashable, int]) – Observed counts for each variant.
alpha (float) – Significance level used for the check.
is_srm (bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise.
warning (str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).

`alpha`

`chi2`

`expected`

`is_srm`

`observed`

`p_value`

`warning`

`assign_variants_df`

Deterministically assign variants for each row in df based on id_col.

Parameters

df (DataFrame) – Input DataFrame with an identifier column.
id_col (str) – Column name in df containing entity identifiers (user_id, session_id, etc.).
experiment_id (str) – Unique identifier for the experiment (versioned for reruns).
variants (Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None.
salt (str) – Secret string to de-correlate from other hash uses and make assignments non-gameable.
layer_id (str) – Identifier for mutual exclusivity layer or surface. In this case work like another random
variant_col (str) – Name of output column to store assigned variant labels.

Returns

DataFrame – A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.

`calculate_mde`

Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

Parameters

sample_size (int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter.
baseline_rate (float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts.
variance (float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance).
alpha (float) – Significance level (Type I error rate).
power (float) – Statistical power (1 - Type II error rate).
data_type (str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts.
ratio (float) – Ratio of the sample allocated to the control group if sample_size is a single integer.

Returns

Dict[str, Any] – A dictionary containing:
'mde': The minimum detectable effect (absolute)
'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
'parameters': The parameters used for the calculation

Examples:

code.pycon

Notes

For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))

where:

z_α/2 is the critical value for significance level α
z_β is the critical value for power
p1 and p2 are the conversion rates in the control and treatment groups
σ1² and σ2² are the variances in the control and treatment groups
n1 and n2 are the sample sizes in the control and treatment groups

`check_srm`

Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

Parameters

assignments (Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as {variant: observed_count} with non-negative integer counts.
target_allocation (dict[Hashable, Number]) – Mapping {variant: p} describing intended allocation as probabilities.
alpha (float) – Significance level. Use strict values like 1e-3 or 1e-4 in production.
min_expected (float) – If any expected count < min_expected, a warning is attached.
strict_variants (bool) – - True: fail if observed variants differ from target keys.
False: drop unknown variants and test only on declared ones.

Returns

SRMResult – The result of the SRM check.

Raises

ValueError – If inputs are invalid or empty.
ImportError – If scipy is required but not installed.

Notes

Target allocation probabilities must sum to 1 within numerical tolerance.
is_srm is computed using the unrounded p-value; the returned p_value is rounded to 5 decimals.
Missing assignments are dropped and reported via warning.
Requires SciPy for p-value computation.

Examples:

code.pycon

`mde`

Utility functions for calculating Minimum Detectable Effect (MDE) for experimental rct_design.

Functions

calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

`calculate_mde`

Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

Parameters

sample_size (int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter.
baseline_rate (float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts.
variance (float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance).
alpha (float) – Significance level (Type I error rate).
power (float) – Statistical power (1 - Type II error rate).
data_type (str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts.
ratio (float) – Ratio of the sample allocated to the control group if sample_size is a single integer.

Returns

Dict[str, Any] – A dictionary containing:
'mde': The minimum detectable effect (absolute)
'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
'parameters': The parameters used for the calculation

Examples:

code.pycon

Notes

For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))

where:

z_α/2 is the critical value for significance level α
z_β is the critical value for power
p1 and p2 are the conversion rates in the control and treatment groups
σ1² and σ2² are the variances in the control and treatment groups
n1 and n2 are the sample sizes in the control and treatment groups

`split`

Split (assignment) utilities for randomized controlled experiments.

This module provides deterministic assignment of variants to entities based on hashing a composite key (salt | layer_id | experiment_id | entity_id) into the unit interval and mapping it to cumulative variant weights.

The implementation mirrors the reference notebook in docs/cases/rct_design.ipynb.

Functions

assign_variants_df – Deterministically assign variants for each row in df based on id_col.

`assign_variants_df`

Deterministically assign variants for each row in df based on id_col.

Parameters

df (DataFrame) – Input DataFrame with an identifier column.
id_col (str) – Column name in df containing entity identifiers (user_id, session_id, etc.).
experiment_id (str) – Unique identifier for the experiment (versioned for reruns).
variants (Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None.
salt (str) – Secret string to de-correlate from other hash uses and make assignments non-gameable.
layer_id (str) – Identifier for mutual exclusivity layer or surface. In this case work like another random
variant_col (str) – Name of output column to store assigned variant labels.

Returns

DataFrame – A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.