Submodule

causalis.dgp.causaldata.functional

functional

Submodule causalis.dgp.causaldata.functional with no child pages and 7 documented members.

Symbol index API members generate_rct generate_classic_rct classic_rct_gamma obs_linear_effect

Functions

Jump directly into the documented functions for this page.

7 items

generate_rctfunction generate_classic_rctfunction classic_rct_gammafunction obs_linear_effectfunction make_cuped_tweediefunction generate_cuped_binaryfunction make_gold_linearfunction

function

causalis.dgp.causaldata.functional.generate_rct

generate_rct

Generate an RCT dataset with randomized treatment assignment.

Uses CausalDatasetGenerator internally, ensuring treatment is independent of X. Specifically designed for benchmarking variance reduction techniques like CUPED.

Notes on effect scale

How outcome_params maps into the structural effect:

outcome_type=”normal”: treatment shifts the mean by (mean[“B”] - mean[“A”]) on the outcome scale.
outcome_type=”binary”: treatment shifts the log-odds by (logit(p_B) - logit(p_A)).
outcome_type=”poisson” or “gamma”: treatment shifts the log-mean by log(lam_B / lam_A).

Ancillary columns (if add_ancillary=True) are generated from baseline confounders X only, avoiding outcome leakage and post-treatment adjustment issues.

Parameters

nint, default=20_000: Number of samples to generate.
splitfloat, default=0.5: Proportion of samples assigned to the treatment group.
random_stateint, optional: Random seed for reproducibility.
outcome_type{“binary”, “normal”, “poisson”, “gamma”}, default=”binary”: Distribution family of the outcome.
outcome_paramsdict, optional: Parameters defining baseline rates/means and treatment effects. e.g., {“p”: {“A”: 0.1, “B”: 0.12}} for binary, or {“shape”: 2.0, “scale”: {“A”: 1.0, “B”: 1.1}} for poisson/gamma.
confounder_specslist of dict, optional: Schema for confounder distributions.
kint, default=0: Number of confounders if specs not provided.
x_samplercallable, optional: Custom sampler for confounders.
add_ancillarybool, default=True: Whether to add descriptive columns like ‘age’, ‘platform’, etc.
deterministic_idsbool, default=False: Whether to generate deterministic user IDs.
add_prebool, default=True: Whether to generate a pre-period covariate (y_pre).
pre_namestr, default=”y_pre”: Name of the pre-period covariate column.
pre_corrfloat, default=0.7: Target correlation between y_pre and the outcome Y in the control group.
prognostic_scalefloat, default=1.0: Scale of the prognostic signal derived from confounders.
include_oraclebool, default=True: Whether to include oracle ground-truth columns like ‘cate’, ‘m’, etc.
return_causal_databool, default=False: Whether to return a CausalData object instead of a pandas.DataFrame.

Returns

pandas.DataFrame or CausalData

Synthetic RCT dataset.

Examples

Canonical target

causalis.dgp.causaldata.functional.generate_rct

Sections

ParametersReturnsExamples

Link to this symbol

function

causalis.dgp.causaldata.functional.generate_classic_rct

generate_classic_rct

Generate a classic RCT dataset with three binary confounders: platform_ios, country_usa, and source_paid.

Parameters

nint, default=10_000: Number of samples to generate.
splitfloat, default=0.5: Proportion of samples assigned to the treatment group.
random_stateint, optional: Random seed for reproducibility.
outcome_paramsdict, optional: Parameters defining baseline rates/means and treatment effects. e.g., {“p”: {“A”: 0.1, “B”: 0.15}} for binary.
add_prebool, default=False: Whether to generate a pre-period covariate (y_pre).
beta_yarray-like, optional: Linear coefficients for confounders in the outcome model.
outcome_depends_on_xbool, default=True: Whether to add default effects for confounders if beta_y is None.
prognostic_scalefloat, default=1.0: Scale of nonlinear prognostic signal (passed to generate_rct).
pre_corrfloat, default=0.7: Target correlation for y_pre (passed to generate_rct).
return_causal_databool, default=False: Whether to return a CausalData object instead of a pandas.DataFrame.
add_ancillarybool, default=False: Whether to add standard ancillary columns (age, platform, etc.).
deterministic_idsbool, default=False: Whether to generate deterministic user IDs.
include_oraclebool, default=True: Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc. **kwargs : Additional arguments passed to generate_rct.

Returns

pandas.DataFrame or CausalData

Synthetic classic RCT dataset.

Canonical target

causalis.dgp.causaldata.functional.generate_classic_rct

Sections

ParametersReturns

Link to this symbol

function

causalis.dgp.causaldata.functional.classic_rct_gamma

classic_rct_gamma

Generate a classic RCT dataset with three binary confounders and a gamma outcome.

The gamma outcome uses a log-mean link, so treatment effects are multiplicative on the mean scale. The default parameters are chosen to resemble a skewed real-world metric (e.g., spend or revenue).

Parameters

nint, default=10_000: Number of samples to generate.
splitfloat, default=0.5: Proportion of samples assigned to the treatment group.
random_stateint, optional: Random seed for reproducibility.
outcome_paramsdict, optional: Gamma parameters, e.g. {“shape”: 2.0, “scale”: {“A”: 15.0, “B”: 16.5}}. Mean = shape * scale.
add_prebool, default=False: Whether to generate a pre-period covariate (y_pre).
beta_yarray-like, optional: Linear coefficients for confounders in the log-mean outcome model.
outcome_depends_on_xbool, default=True: Whether to add default effects for confounders if beta_y is None.
prognostic_scalefloat, default=1.0: Scale of nonlinear prognostic signal.
pre_corrfloat, default=0.7: Target correlation for y_pre with post-outcome in control group.
add_ancillarybool, default=True: Whether to add standard ancillary columns (age, platform, etc.).
deterministic_idsbool, default=False: Whether to generate deterministic user IDs.
include_oraclebool, default=True: Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc.
return_causal_databool, default=False: Whether to return a CausalData object instead of a pandas.DataFrame. **kwargs : Additional arguments passed to generate_rct (e.g., pre_name, g_y, use_prognostic).

Returns

pandas.DataFrame or CausalData

Synthetic classic RCT dataset with gamma outcome.

Canonical target

causalis.dgp.causaldata.functional.classic_rct_gamma

Sections

ParametersReturns

Link to this symbol

function

causalis.dgp.causaldata.functional.obs_linear_effect

obs_linear_effect

Generate an observational dataset with linear effects of confounders and a constant treatment effect.

Parameters

nint, default=10_000: Number of samples to generate.
thetafloat, default=1.0: Constant treatment effect.
outcome_type{“continuous”, “binary”, “poisson”, “gamma”}, default=”continuous”: Family of the outcome distribution.
sigma_yfloat, default=1.0: Noise level for continuous outcomes.
target_d_ratefloat, optional: Target treatment prevalence (propensity mean).
confounder_specslist of dict, optional: Schema for confounder distributions.
beta_yarray-like, optional: Linear coefficients for confounders in the outcome model.
beta_darray-like, optional: Linear coefficients for confounders in the treatment model.
random_stateint, optional: Random seed for reproducibility.
kint, default=0: Number of confounders if specs not provided.
x_samplercallable, optional: Custom sampler for confounders.
include_oraclebool, default=True: Whether to include oracle ground-truth columns like ‘cate’, ‘m’, etc.
add_ancillarybool, default=False: If True, adds standard ancillary columns (age, platform, etc.).
deterministic_idsbool, default=False: If True, generates deterministic user IDs.

Returns

pandas.DataFrame

Synthetic observational dataset.

Notes

This helper is a lightweight observational benchmark:

treatment is not randomized unless beta_d is zero and target_d_rate forces a near-constant propensity;
oracle columns such as m and cate are available when include_oracle=True;
the treatment effect is constant on the structural link scale, so heterogeneity only enters through the outcome family transformation.

Examples

Canonical target

causalis.dgp.causaldata.functional.obs_linear_effect

Sections

ParametersReturnsNotesExamples

Link to this symbol

function

causalis.dgp.causaldata.functional.make_cuped_tweedie

make_cuped_tweedie

Tweedie-like DGP with mixed marginals and structured HTE. Features many zeros and a heavy right tail. Suitable for CUPED benchmarking.

Parameters

nint, default=10000: Number of samples to generate.
seedint, default=42: Random seed.
add_prebool, default=True: Whether to add a pre-period covariate ‘y_pre’.
pre_namestr, default=”y_pre”: Name of the pre-period covariate column.
pre_target_corrfloat, default=0.6: Target correlation between y_pre and post-outcome y in control group.
pre_specPreCorrSpec, optional: Detailed specification for pre-period calibration (transform, method, etc.). If provided, pre_target_corr is ignored in favor of pre_spec.target_corr.
include_oraclebool, default=False: Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc.
return_causal_databool, default=True: Whether to return a CausalData object.
theta_logfloat, default=0.2: The log-uplift theta parameter for the treatment effect.

Returns

pd.DataFrame or CausalData

Canonical target

causalis.dgp.causaldata.functional.make_cuped_tweedie

Sections

ParametersReturns

Link to this symbol

function

causalis.dgp.causaldata.functional.generate_cuped_binary

generate_cuped_binary

Binary CUPED-oriented DGP with richer confounders and structured HTE.

Designed for CUPED benchmarking with randomized treatment and a calibrated pre-period covariate while preserving exact oracle cate under include_oracle.

Parameters

nint, default=10000: Number of samples to generate.
seedint, default=42: Random seed.
add_prebool, default=True: Whether to add a pre-period covariate.
pre_namestr, default=”y_pre”: Name of the pre-period covariate column.
pre_target_corrfloat, default=0.65: Target correlation between y_pre and post-outcome y in the control group.
pre_specPreCorrSpec, optional: Detailed specification for pre-period calibration. If provided, pre_target_corr is ignored in favor of pre_spec.target_corr.
include_oraclebool, default=True: Whether to include oracle columns like m, g0, g1, cate.
return_causal_databool, default=True: Whether to return a CausalData object.
theta_logitfloat, default=0.38: Baseline log-odds uplift scale for heterogeneous treatment effects.

Returns

pd.DataFrame or CausalData

Canonical target

causalis.dgp.causaldata.functional.generate_cuped_binary

Sections

ParametersReturns

Link to this symbol

function

causalis.dgp.causaldata.functional.make_gold_linear

make_gold_linear

A standard linear benchmark with moderate confounding. Based on the benchmark scenario in docs/research/dgp_benchmarking.ipynb.

Canonical target

causalis.dgp.causaldata.functional.make_gold_linear

Link to this symbol