Skip to content
Submodule
causalis.dgp.causaldata.functional

functional

Submodule causalis.dgp.causaldata.functional with no child pages and 7 documented members.

Functions

Jump directly into the documented functions for this page.

7 items
function
causalis.dgp.causaldata.functional.generate_rct

generate_rct

Generate an RCT dataset with randomized treatment assignment.

Uses CausalDatasetGenerator internally, ensuring treatment is independent of X. Specifically designed for benchmarking variance reduction techniques like CUPED.

Notes on effect scale

How outcome_params maps into the structural effect:

  • outcome_type=”normal”: treatment shifts the mean by (mean[“B”] - mean[“A”]) on the outcome scale.

  • outcome_type=”binary”: treatment shifts the log-odds by (logit(p_B) - logit(p_A)).

  • outcome_type=”poisson” or “gamma”: treatment shifts the log-mean by log(lam_B / lam_A).

Ancillary columns (if add_ancillary=True) are generated from baseline confounders X only, avoiding outcome leakage and post-treatment adjustment issues.

Parameters

nint, default=20_000

Number of samples to generate.

splitfloat, default=0.5

Proportion of samples assigned to the treatment group.

random_stateint, optional

Random seed for reproducibility.

outcome_type{“binary”, “normal”, “poisson”, “gamma”}, default=”binary”

Distribution family of the outcome.

outcome_paramsdict, optional

Parameters defining baseline rates/means and treatment effects. e.g., {“p”: {“A”: 0.1, “B”: 0.12}} for binary, or {“shape”: 2.0, “scale”: {“A”: 1.0, “B”: 1.1}} for poisson/gamma.

confounder_specslist of dict, optional

Schema for confounder distributions.

kint, default=0

Number of confounders if specs not provided.

x_samplercallable, optional

Custom sampler for confounders.

add_ancillarybool, default=True

Whether to add descriptive columns like ‘age’, ‘platform’, etc.

deterministic_idsbool, default=False

Whether to generate deterministic user IDs.

add_prebool, default=True

Whether to generate a pre-period covariate (y_pre).

pre_namestr, default=”y_pre”

Name of the pre-period covariate column.

pre_corrfloat, default=0.7

Target correlation between y_pre and the outcome Y in the control group.

prognostic_scalefloat, default=1.0

Scale of the prognostic signal derived from confounders.

include_oraclebool, default=True

Whether to include oracle ground-truth columns like ‘cate’, ‘m’, etc.

return_causal_databool, default=False

Whether to return a CausalData object instead of a pandas.DataFrame.

Returns

pandas.DataFrame or CausalData

Synthetic RCT dataset.

Examples

Canonical target

causalis.dgp.causaldata.functional.generate_rct

Sections

ParametersReturnsExamples
Link to this symbol
function
causalis.dgp.causaldata.functional.generate_classic_rct

generate_classic_rct

Generate a classic RCT dataset with three binary confounders: platform_ios, country_usa, and source_paid.

Parameters

nint, default=10_000

Number of samples to generate.

splitfloat, default=0.5

Proportion of samples assigned to the treatment group.

random_stateint, optional

Random seed for reproducibility.

outcome_paramsdict, optional

Parameters defining baseline rates/means and treatment effects. e.g., {“p”: {“A”: 0.1, “B”: 0.15}} for binary.

add_prebool, default=False

Whether to generate a pre-period covariate (y_pre).

beta_yarray-like, optional

Linear coefficients for confounders in the outcome model.

outcome_depends_on_xbool, default=True

Whether to add default effects for confounders if beta_y is None.

prognostic_scalefloat, default=1.0

Scale of nonlinear prognostic signal (passed to generate_rct).

pre_corrfloat, default=0.7

Target correlation for y_pre (passed to generate_rct).

return_causal_databool, default=False

Whether to return a CausalData object instead of a pandas.DataFrame.

add_ancillarybool, default=False

Whether to add standard ancillary columns (age, platform, etc.).

deterministic_idsbool, default=False

Whether to generate deterministic user IDs.

include_oraclebool, default=True

Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc. **kwargs : Additional arguments passed to generate_rct.

Returns

pandas.DataFrame or CausalData

Synthetic classic RCT dataset.

Canonical target

causalis.dgp.causaldata.functional.generate_classic_rct

Sections

ParametersReturns
Link to this symbol
function
causalis.dgp.causaldata.functional.classic_rct_gamma

classic_rct_gamma

Generate a classic RCT dataset with three binary confounders and a gamma outcome.

The gamma outcome uses a log-mean link, so treatment effects are multiplicative on the mean scale. The default parameters are chosen to resemble a skewed real-world metric (e.g., spend or revenue).

Parameters

nint, default=10_000

Number of samples to generate.

splitfloat, default=0.5

Proportion of samples assigned to the treatment group.

random_stateint, optional

Random seed for reproducibility.

outcome_paramsdict, optional

Gamma parameters, e.g. {“shape”: 2.0, “scale”: {“A”: 15.0, “B”: 16.5}}. Mean = shape * scale.

add_prebool, default=False

Whether to generate a pre-period covariate (y_pre).

beta_yarray-like, optional

Linear coefficients for confounders in the log-mean outcome model.

outcome_depends_on_xbool, default=True

Whether to add default effects for confounders if beta_y is None.

prognostic_scalefloat, default=1.0

Scale of nonlinear prognostic signal.

pre_corrfloat, default=0.7

Target correlation for y_pre with post-outcome in control group.

add_ancillarybool, default=True

Whether to add standard ancillary columns (age, platform, etc.).

deterministic_idsbool, default=False

Whether to generate deterministic user IDs.

include_oraclebool, default=True

Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc.

return_causal_databool, default=False

Whether to return a CausalData object instead of a pandas.DataFrame. **kwargs : Additional arguments passed to generate_rct (e.g., pre_name, g_y, use_prognostic).

Returns

pandas.DataFrame or CausalData

Synthetic classic RCT dataset with gamma outcome.

Canonical target

causalis.dgp.causaldata.functional.classic_rct_gamma

Sections

ParametersReturns
Link to this symbol
function
causalis.dgp.causaldata.functional.obs_linear_effect

obs_linear_effect

Generate an observational dataset with linear effects of confounders and a constant treatment effect.

Parameters

nint, default=10_000

Number of samples to generate.

thetafloat, default=1.0

Constant treatment effect.

outcome_type{“continuous”, “binary”, “poisson”, “gamma”}, default=”continuous”

Family of the outcome distribution.

sigma_yfloat, default=1.0

Noise level for continuous outcomes.

target_d_ratefloat, optional

Target treatment prevalence (propensity mean).

confounder_specslist of dict, optional

Schema for confounder distributions.

beta_yarray-like, optional

Linear coefficients for confounders in the outcome model.

beta_darray-like, optional

Linear coefficients for confounders in the treatment model.

random_stateint, optional

Random seed for reproducibility.

kint, default=0

Number of confounders if specs not provided.

x_samplercallable, optional

Custom sampler for confounders.

include_oraclebool, default=True

Whether to include oracle ground-truth columns like ‘cate’, ‘m’, etc.

add_ancillarybool, default=False

If True, adds standard ancillary columns (age, platform, etc.).

deterministic_idsbool, default=False

If True, generates deterministic user IDs.

Returns

pandas.DataFrame

Synthetic observational dataset.

Notes

This helper is a lightweight observational benchmark:

  • treatment is not randomized unless beta_d is zero and target_d_rate forces a near-constant propensity;

  • oracle columns such as m and cate are available when include_oracle=True;

  • the treatment effect is constant on the structural link scale, so heterogeneity only enters through the outcome family transformation.

Examples

Canonical target

causalis.dgp.causaldata.functional.obs_linear_effect

Sections

ParametersReturnsNotesExamples
Link to this symbol
function
causalis.dgp.causaldata.functional.make_cuped_tweedie

make_cuped_tweedie

Tweedie-like DGP with mixed marginals and structured HTE. Features many zeros and a heavy right tail. Suitable for CUPED benchmarking.

Parameters

nint, default=10000

Number of samples to generate.

seedint, default=42

Random seed.

add_prebool, default=True

Whether to add a pre-period covariate ‘y_pre’.

pre_namestr, default=”y_pre”

Name of the pre-period covariate column.

pre_target_corrfloat, default=0.6

Target correlation between y_pre and post-outcome y in control group.

pre_specPreCorrSpec, optional

Detailed specification for pre-period calibration (transform, method, etc.). If provided, pre_target_corr is ignored in favor of pre_spec.target_corr.

include_oraclebool, default=False

Whether to include oracle ground-truth columns like ‘cate’, ‘propensity’, etc.

return_causal_databool, default=True

Whether to return a CausalData object.

theta_logfloat, default=0.2

The log-uplift theta parameter for the treatment effect.

Returns

pd.DataFrame or CausalData

Canonical target

causalis.dgp.causaldata.functional.make_cuped_tweedie

Sections

ParametersReturns
Link to this symbol
function
causalis.dgp.causaldata.functional.generate_cuped_binary

generate_cuped_binary

Binary CUPED-oriented DGP with richer confounders and structured HTE.

Designed for CUPED benchmarking with randomized treatment and a calibrated pre-period covariate while preserving exact oracle cate under include_oracle.

Parameters

nint, default=10000

Number of samples to generate.

seedint, default=42

Random seed.

add_prebool, default=True

Whether to add a pre-period covariate.

pre_namestr, default=”y_pre”

Name of the pre-period covariate column.

pre_target_corrfloat, default=0.65

Target correlation between y_pre and post-outcome y in the control group.

pre_specPreCorrSpec, optional

Detailed specification for pre-period calibration. If provided, pre_target_corr is ignored in favor of pre_spec.target_corr.

include_oraclebool, default=True

Whether to include oracle columns like m, g0, g1, cate.

return_causal_databool, default=True

Whether to return a CausalData object.

theta_logitfloat, default=0.38

Baseline log-odds uplift scale for heterogeneous treatment effects.

Returns

pd.DataFrame or CausalData

Canonical target

causalis.dgp.causaldata.functional.generate_cuped_binary

Sections

ParametersReturns
Link to this symbol
function
causalis.dgp.causaldata.functional.make_gold_linear

make_gold_linear

A standard linear benchmark with moderate confounding. Based on the benchmark scenario in docs/research/dgp_benchmarking.ipynb.

Canonical target

causalis.dgp.causaldata.functional.make_gold_linear

Link to this symbol