causalis.dgp.causaldata.functional.generate_rctgenerate_rct
Generate an RCT dataset with randomized treatment assignment.
Uses CausalDatasetGenerator internally, ensuring treatment is independent of X.
Specifically designed for benchmarking variance reduction techniques like CUPED.
Notes on effect scale
How outcome_params maps into the structural effect:
outcome_type=”normal”: treatment shifts the mean by (mean[“B”] - mean[“A”]) on the outcome scale.
outcome_type=”binary”: treatment shifts the log-odds by (logit(p_B) - logit(p_A)).
outcome_type=”poisson” or “gamma”: treatment shifts the log-mean by log(lam_B / lam_A).
Ancillary columns (if add_ancillary=True) are generated from baseline confounders X only, avoiding outcome leakage and post-treatment adjustment issues.
Parameters
- nint, default=20_000
Number of samples to generate.
- splitfloat, default=0.5
Proportion of samples assigned to the treatment group.
- random_stateint, optional
Random seed for reproducibility.
- outcome_type{“binary”, “normal”, “poisson”, “gamma”}, default=”binary”
Distribution family of the outcome.
- outcome_paramsdict, optional
Parameters defining baseline rates/means and treatment effects. e.g., {“p”: {“A”: 0.1, “B”: 0.12}} for binary, or {“shape”: 2.0, “scale”: {“A”: 1.0, “B”: 1.1}} for poisson/gamma.
- confounder_specslist of dict, optional
Schema for confounder distributions.
- kint, default=0
Number of confounders if specs not provided.
- x_samplercallable, optional
Custom sampler for confounders.
- add_ancillarybool, default=True
Whether to add descriptive columns like ‘age’, ‘platform’, etc.
- deterministic_idsbool, default=False
Whether to generate deterministic user IDs.
- add_prebool, default=True
Whether to generate a pre-period covariate (
y_pre).- pre_namestr, default=”y_pre”
Name of the pre-period covariate column.
- pre_corrfloat, default=0.7
Target correlation between
y_preand the outcome Y in the control group.- prognostic_scalefloat, default=1.0
Scale of the prognostic signal derived from confounders.
- include_oraclebool, default=True
Whether to include oracle ground-truth columns like ‘cate’, ‘m’, etc.
- return_causal_databool, default=False
Whether to return a
CausalDataobject instead of apandas.DataFrame.
Returns
Synthetic RCT dataset.
Examples
Canonical target
causalis.dgp.causaldata.functional.generate_rct
Sections