API ReferenceEntry

generate_rct

generate_rct

Reference details for generate_rct in causalis.data_contracts.

generate_rct

Generate an RCT dataset with randomized treatment assignment.

Uses CausalDatasetGenerator internally, ensuring treatment is independent of X. Specifically designed for benchmarking variance reduction techniques like CUPED.

Notes on effect scale

How outcome_params maps into the structural effect:

  • outcome_type="normal": treatment shifts the mean by (mean["B"] - mean["A"]) on the outcome scale.
  • outcome_type="binary": treatment shifts the log-odds by (logit(p_B) - logit(p_A)).
  • outcome_type="poisson" or "gamma": treatment shifts the log-mean by log(lam_B / lam_A).

Ancillary columns (if add_ancillary=True) are generated from baseline confounders X only, avoiding outcome leakage and post-treatment adjustment issues.

Parameters
  • n (int) – Number of samples to generate.
  • split (float) – Proportion of samples assigned to the treatment group.
  • random_state (int) – Random seed for reproducibility.
  • outcome_type (('binary', 'normal', 'poisson', 'gamma')) – Distribution family of the outcome.
  • outcome_params (dict) – Parameters defining baseline rates/means and treatment effects. e.g., {"p": {"A": 0.1, "B": 0.12}} for binary, or {"shape": 2.0, "scale": {"A": 1.0, "B": 1.1}} for poisson/gamma.
  • confounder_specs (list of dict) – Schema for confounder distributions.
  • k (int) – Number of confounders if specs not provided.
  • x_sampler (callable) – Custom sampler for confounders.
  • add_ancillary (bool) – Whether to add descriptive columns like 'age', 'platform', etc.
  • deterministic_ids (bool) – Whether to generate deterministic user IDs.
  • add_pre (bool) – Whether to generate a pre-period covariate (y_pre).
  • pre_name (str) – Name of the pre-period covariate column.
  • pre_corr (float) – Target correlation between y_pre and the outcome Y in the control group.
  • prognostic_scale (float) – Scale of the prognostic signal derived from confounders.
  • include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'm', etc.
  • return_causal_data (bool) – Whether to return a CausalData object instead of a pandas.DataFrame.
Returns