API ReferenceEntry

rct_design

rct_design

Reference details for rct_design in causalis.shared.

rct_design

Design module for experimental rct_design utilities.

Modules
  • mde – Utility functions for calculating Minimum Detectable Effect (MDE) for experimental rct_design.
  • split – Split (assignment) utilities for randomized controlled experiments.
Classes
  • SRMResult – Result of a Sample Ratio Mismatch (SRM) check.
Functions
  • assign_variants_df – Deterministically assign variants for each row in df based on id_col.
  • calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
  • check_srm – Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.
SRMResult

Result of a Sample Ratio Mismatch (SRM) check.

Attributes
  • chi2 (float) – The calculated chi-square statistic.
  • p_value (float) – The p-value of the test, rounded to 5 decimals.
  • expected (dict[Hashable, float]) – Expected counts for each variant.
  • observed (dict[Hashable, int]) – Observed counts for each variant.
  • alpha (float) – Significance level used for the check.
  • is_srm (bool) – True if an SRM was detected (chi-square p-value < alpha), False otherwise.
  • warning (str or None) – Warning message if the test assumptions might be violated (e.g., small expected counts).
alpha
chi2
expected
is_srm
observed
p_value
warning
assign_variants_df

Deterministically assign variants for each row in df based on id_col.

Parameters
  • df (DataFrame) – Input DataFrame with an identifier column.
  • id_col (str) – Column name in df containing entity identifiers (user_id, session_id, etc.).
  • experiment_id (str) – Unique identifier for the experiment (versioned for reruns).
  • variants (Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None.
  • salt (str) – Secret string to de-correlate from other hash uses and make assignments non-gameable.
  • layer_id (str) – Identifier for mutual exclusivity layer or surface. In this case work like another random
  • variant_col (str) – Name of output column to store assigned variant labels.
Returns
  • DataFrame – A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.
calculate_mde

Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

Parameters
  • sample_size (int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter.
  • baseline_rate (float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts.
  • variance (float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance).
  • alpha (float) – Significance level (Type I error rate).
  • power (float) – Statistical power (1 - Type II error rate).
  • data_type (str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts.
  • ratio (float) – Ratio of the sample allocated to the control group if sample_size is a single integer.
Returns
  • Dict[str, Any] – A dictionary containing:
  • 'mde': The minimum detectable effect (absolute)
  • 'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
  • 'parameters': The parameters used for the calculation

Examples:

code.pycon
code.pycon
Notes

For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))

where:

  • z_α/2 is the critical value for significance level α
  • z_β is the critical value for power
  • p1 and p2 are the conversion rates in the control and treatment groups
  • σ1² and σ2² are the variances in the control and treatment groups
  • n1 and n2 are the sample sizes in the control and treatment groups
check_srm

Check Sample Ratio Mismatch (SRM) for an RCT via a chi-square goodness-of-fit test.

Parameters
  • assignments (Iterable[Hashable] or Series or CausalData or Mapping[Hashable, Number]) – Observed variant assignments. If iterable or Series, elements are labels per unit (user_id, session_id, etc.). If CausalData is provided, the treatment column is used. If a mapping is provided, it is treated as &#123;variant: observed_count&#125; with non-negative integer counts.
  • target_allocation (dict[Hashable, Number]) – Mapping &#123;variant: p&#125; describing intended allocation as probabilities.
  • alpha (float) – Significance level. Use strict values like 1e-3 or 1e-4 in production.
  • min_expected (float) – If any expected count < min_expected, a warning is attached.
  • strict_variants (bool) – - True: fail if observed variants differ from target keys.
  • False: drop unknown variants and test only on declared ones.
Returns
Raises
Notes
  • Target allocation probabilities must sum to 1 within numerical tolerance.
  • is_srm is computed using the unrounded p-value; the returned p_value is rounded to 5 decimals.
  • Missing assignments are dropped and reported via warning.
  • Requires SciPy for p-value computation.

Examples:

code.pycon
code.pycon
mde

Utility functions for calculating Minimum Detectable Effect (MDE) for experimental rct_design.

Functions
  • calculate_mde – Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.
calculate_mde

Calculate the Minimum Detectable Effect (MDE) for conversion or continuous data_contracts.

Parameters
  • sample_size (int or tuple of int) – Total sample size or a tuple of (control_size, treatment_size). If a single integer is provided, the sample will be split according to the ratio parameter.
  • baseline_rate (float) – Baseline conversion rate (for conversion data_contracts) or baseline mean (for continuous data_contracts). Required for conversion data_contracts.
  • variance (float or tuple of float) – Variance of the data_contracts. For conversion data_contracts, this is calculated from the baseline rate if not provided. For continuous data_contracts, this parameter is required. Can be a single float (assumed same for both groups) or a tuple of (control_variance, treatment_variance).
  • alpha (float) – Significance level (Type I error rate).
  • power (float) – Statistical power (1 - Type II error rate).
  • data_type (str) – Type of data_contracts. Either 'conversion' for binary/conversion data_contracts or 'continuous' for continuous data_contracts.
  • ratio (float) – Ratio of the sample allocated to the control group if sample_size is a single integer.
Returns
  • Dict[str, Any] – A dictionary containing:
  • 'mde': The minimum detectable effect (absolute)
  • 'mde_relative': The minimum detectable effect as a percentage of the baseline (relative)
  • 'parameters': The parameters used for the calculation

Examples:

code.pycon
code.pycon
Notes

For conversion data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))

For continuous data_contracts, the MDE is calculated using the formula: MDE = (z_α/2 + z_β) * sqrt((σ1²/n1) + (σ2²/n2))

where:

  • z_α/2 is the critical value for significance level α
  • z_β is the critical value for power
  • p1 and p2 are the conversion rates in the control and treatment groups
  • σ1² and σ2² are the variances in the control and treatment groups
  • n1 and n2 are the sample sizes in the control and treatment groups
split

Split (assignment) utilities for randomized controlled experiments.

This module provides deterministic assignment of variants to entities based on hashing a composite key (salt | layer_id | experiment_id | entity_id) into the unit interval and mapping it to cumulative variant weights.

The implementation mirrors the reference notebook in docs/cases/rct_design.ipynb.

Functions
  • assign_variants_df – Deterministically assign variants for each row in df based on id_col.
assign_variants_df

Deterministically assign variants for each row in df based on id_col.

Parameters
  • df (DataFrame) – Input DataFrame with an identifier column.
  • id_col (str) – Column name in df containing entity identifiers (user_id, session_id, etc.).
  • experiment_id (str) – Unique identifier for the experiment (versioned for reruns).
  • variants (Dict[str, float]) – Mapping from variant name to weight (coverage). Weights must be non-negative and their sum must be in (0, 1]. If the sum is < 1, the remaining mass corresponds to "not in experiment" and the assignment will be None.
  • salt (str) – Secret string to de-correlate from other hash uses and make assignments non-gameable.
  • layer_id (str) – Identifier for mutual exclusivity layer or surface. In this case work like another random
  • variant_col (str) – Name of output column to store assigned variant labels.
Returns
  • DataFrame – A copy of df with an extra column variant_col. Entities outside experiment coverage will have None in the variant column.