Skip to content
Submodule
causalis.dgp.causaldata.base

base

Submodule causalis.dgp.causaldata.base with no child pages and 36 documented members.

Classes

Jump directly into the documented classes for this page.

1 items
class
causalis.dgp.causaldata.base.CausalDatasetGenerator

CausalDatasetGenerator

Generate synthetic causal inference datasets with controllable confounding, treatment prevalence, noise, and (optionally) heterogeneous treatment effects.

Data model (high level)

  • confounders X ∈ R^k are drawn from user-specified distributions.

  • Binary treatment D is assigned by a logistic model: D ~ Bernoulli( sigmoid(alpha_d + f_d(X) + u_strength_d * U) ), where f_d(X) = (X @ beta_d + g_d(X)) * propensity_sharpness, and U ~ N(0,1) is an optional unobserved confounder.

  • Outcome Y depends on treatment and confounders with link determined by outcome_type: outcome_type = “continuous”: Y = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) + ε, ε ~ N(0, sigma_y^2) outcome_type = “binary”: logit P(Y=1|T,X,U) = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) outcome_type = “poisson”: log E[Y|T,X,U] = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) outcome_type = “gamma”: log E[Y|T,X,U] = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) where f_y(X) = X @ beta_y + g_y(X), and tau(X) is either constant theta or a user function.

Returned columns

  • y: outcome

  • d: binary treatment (0/1)

  • x1..xk (or user-provided names)

  • m: true propensity P(T=1 | X) marginalized over U

  • m_obs: realized propensity P(T=1 | X, U)

  • tau_link: tau(X) on the structural (link) scale

  • g0: E[Y | X, T=0] on the natural outcome scale marginalized over U .,9

  • g1: E[Y | X, T=1] on the natural outcome scale marginalized over U

  • cate: g1 - g0 (conditional average treatment effect on the natural outcome scale)

Notes on effect scale:

  • For “continuous”, theta (or tau(X)) is an additive mean difference, so tau_link == cate.

  • For “binary”, tau acts on the log-odds scale. cate is reported as a risk difference.

  • For “poisson” and “gamma”, tau acts on the log-mean scale. cate is reported on the mean scale.

Parameters

thetafloat, default=1.0

Constant treatment effect used if tau is None.

taucallable, optional

Function tau(X) -> array-like shape (n,) for heterogeneous effects.

beta_yarray-like, optional

Linear coefficients of confounders in the outcome baseline f_y(X).

beta_darray-like, optional

Linear coefficients of confounders in the treatment score f_d(X).

g_ycallable, optional

Nonlinear/additive function g_y(X) -> (n,) added to the outcome baseline.

g_dcallable, optional

Nonlinear/additive function g_d(X) -> (n,) added to the treatment score.

alpha_yfloat, default=0.0

Outcome intercept (natural scale for continuous; log-odds for binary; log-mean for Poisson/Gamma).

alpha_dfloat, default=0.0

Treatment intercept (log-odds). If target_d_rate is set, alpha_d is auto-calibrated.

sigma_yfloat, default=1.0

Std. dev. of the Gaussian noise for continuous outcomes.

outcome_type{“continuous”, “binary”, “poisson”, “gamma”, “tweedie”}, default=”continuous”

Outcome family and link as defined above.

confounder_specslist of dict, optional

Schema for generating confounders. See _gaussian_copula for details.

kint, default=5

Number of confounders when confounder_specs is None. Defaults to independent N(0,1).

x_samplercallable, optional

Custom sampler (n, k, seed) -> X ndarray of shape (n,k). Overrides confounder_specs.

use_copulabool, default=False

If True and confounder_specs provided, use Gaussian copula for X.

copula_corrarray-like, optional

Correlation matrix for copula.

target_d_ratefloat, optional

Target treatment prevalence (propensity mean). Calibrates alpha_d.

u_strength_dfloat, default=0.0

Strength of the unobserved confounder U in treatment assignment.

u_strength_yfloat, default=0.0

Strength of the unobserved confounder U in the outcome.

propensity_sharpnessfloat, default=1.0

Scales the X-driven treatment score to adjust positivity difficulty.

seedint, optional

Random seed for reproducibility.

Attributes

rngnumpy.random.Generator

Internal RNG seeded from seed.

Notes

Oracle outputs are reported on the natural outcome scale:

  • m is the treatment propensity marginalized over latent noise.

  • g0 and g1 are mean potential outcomes on the observed outcome scale.

  • cate is always g1 - g0 on that same natural scale, even when the structural treatment effect is specified on a link scale such as log-odds or log-mean.

Examples

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator

Sections

ParametersAttributesNotesExamples
Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.theta

theta

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.theta

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.tau

tau

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.tau

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_y

beta_y

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_y

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_d

beta_d

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_d

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.g_y

g_y

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_y

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.g_d

g_d

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_d

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_y

alpha_y

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_y

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_d

alpha_d

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_d

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.sigma_y

sigma_y

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.sigma_y

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.outcome_type

outcome_type

Value: 'continuous'

‘continuous’

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.outcome_type

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.confounder_specs

confounder_specs

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.confounder_specs

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.k

k

Value: 5

5

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.k

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.x_sampler

x_sampler

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.x_sampler

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.use_copula

use_copula

Value: False

False

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.use_copula

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.copula_corr

copula_corr

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.copula_corr

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.target_d_rate

target_d_rate

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.target_d_rate

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_d

u_strength_d

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_d

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_y

u_strength_y

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_y

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.propensity_sharpness

propensity_sharpness

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.propensity_sharpness

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.score_bounding

score_bounding

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.score_bounding

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_zi

alpha_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_zi

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_zi

beta_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_zi

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.g_zi

g_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_zi

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_zi

u_strength_zi

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_zi

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.tau_zi

tau_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.tau_zi

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.pos_dist

pos_dist

Value: 'gamma'

‘gamma’

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.pos_dist

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.gamma_shape

gamma_shape

Value: 2.0

2.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.gamma_shape

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.lognormal_sigma

lognormal_sigma

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.lognormal_sigma

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.include_oracle

include_oracle

Value: True

True

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.include_oracle

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.seed

seed

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.seed

Link to this symbol
attribute
causalis.dgp.causaldata.base.CausalDatasetGenerator.rng

rng

Value: 'field(...)'

‘field(…)’

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.rng

Link to this symbol
method
causalis.dgp.causaldata.base.CausalDatasetGenerator.__post_init__

__post_init__

Initialize RNG and validate configuration.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.__post_init__

Link to this symbol
method
causalis.dgp.causaldata.base.CausalDatasetGenerator.generate

generate

Draw a synthetic dataset of size n.

Parameters

nint

Number of samples to generate.

Unumpy.ndarray, optional

Unobserved confounder. If None, generated from N(0,1).

Returns

pandas.DataFrame

The generated dataset with outcome ‘y’, treatment ‘d’, confounders, and oracle ground-truth columns.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.generate

Sections

ParametersReturns
Link to this symbol
method
causalis.dgp.causaldata.base.CausalDatasetGenerator.to_causal_data

to_causal_data

Generate a dataset and convert it to a CausalData object.

Parameters

nint

Number of samples to generate.

confoundersstr or list of str, optional

List of confounder column names to include. If None, automatically detects numeric confounders.

Returns

CausalData

A CausalData object containing the generated dataset.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.to_causal_data

Sections

ParametersReturns
Link to this symbol
method
causalis.dgp.causaldata.base.CausalDatasetGenerator.oracle_nuisance

oracle_nuisance

Return nuisance functions (m(x), g0(x), g1(x)) compatible with IRM.

Parameters

num_quadint, default=21

Number of quadrature points for marginalizing over U.

Returns

dict

Dictionary of callables mapping X to nuisance values.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.oracle_nuisance

Sections

ParametersReturns
Link to this symbol