Submodule

causalis.dgp.causaldata.base

base

Submodule causalis.dgp.causaldata.base with no child pages and 36 documented members.

Symbol index API members CausalDatasetGenerator theta tau beta_y

Classes

Jump directly into the documented classes for this page.

1 items

CausalDatasetGeneratorclass

class

causalis.dgp.causaldata.base.CausalDatasetGenerator

CausalDatasetGenerator

Generate synthetic causal inference datasets with controllable confounding, treatment prevalence, noise, and (optionally) heterogeneous treatment effects.

Data model (high level)

confounders X ∈ R^k are drawn from user-specified distributions.
Binary treatment D is assigned by a logistic model: D ~ Bernoulli( sigmoid(alpha_d + f_d(X) + u_strength_d * U) ), where f_d(X) = (X @ beta_d + g_d(X)) * propensity_sharpness, and U ~ N(0,1) is an optional unobserved confounder.
Outcome Y depends on treatment and confounders with link determined by outcome_type: outcome_type = “continuous”: Y = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) + ε, ε ~ N(0, sigma_y^2) outcome_type = “binary”: logit P(Y=1|T,X,U) = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) outcome_type = “poisson”: log E[Y|T,X,U] = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) outcome_type = “gamma”: log E[Y|T,X,U] = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) where f_y(X) = X @ beta_y + g_y(X), and tau(X) is either constant theta or a user function.

Returned columns

y: outcome
d: binary treatment (0/1)
x1..xk (or user-provided names)
m: true propensity P(T=1 | X) marginalized over U
m_obs: realized propensity P(T=1 | X, U)
tau_link: tau(X) on the structural (link) scale
g0: E[Y | X, T=0] on the natural outcome scale marginalized over U .,9
g1: E[Y | X, T=1] on the natural outcome scale marginalized over U
cate: g1 - g0 (conditional average treatment effect on the natural outcome scale)

Notes on effect scale:

For “continuous”, theta (or tau(X)) is an additive mean difference, so tau_link == cate.
For “binary”, tau acts on the log-odds scale. cate is reported as a risk difference.
For “poisson” and “gamma”, tau acts on the log-mean scale. cate is reported on the mean scale.

Parameters

thetafloat, default=1.0: Constant treatment effect used if tau is None.
taucallable, optional: Function tau(X) -> array-like shape (n,) for heterogeneous effects.
beta_yarray-like, optional: Linear coefficients of confounders in the outcome baseline f_y(X).
beta_darray-like, optional: Linear coefficients of confounders in the treatment score f_d(X).
g_ycallable, optional: Nonlinear/additive function g_y(X) -> (n,) added to the outcome baseline.
g_dcallable, optional: Nonlinear/additive function g_d(X) -> (n,) added to the treatment score.
alpha_yfloat, default=0.0: Outcome intercept (natural scale for continuous; log-odds for binary; log-mean for Poisson/Gamma).
alpha_dfloat, default=0.0: Treatment intercept (log-odds). If target_d_rate is set, alpha_d is auto-calibrated.
sigma_yfloat, default=1.0: Std. dev. of the Gaussian noise for continuous outcomes.
outcome_type{“continuous”, “binary”, “poisson”, “gamma”, “tweedie”}, default=”continuous”: Outcome family and link as defined above.
confounder_specslist of dict, optional: Schema for generating confounders. See _gaussian_copula for details.
kint, default=5: Number of confounders when confounder_specs is None. Defaults to independent N(0,1).
x_samplercallable, optional: Custom sampler (n, k, seed) -> X ndarray of shape (n,k). Overrides confounder_specs.
use_copulabool, default=False: If True and confounder_specs provided, use Gaussian copula for X.
copula_corrarray-like, optional: Correlation matrix for copula.
target_d_ratefloat, optional: Target treatment prevalence (propensity mean). Calibrates alpha_d.
u_strength_dfloat, default=0.0: Strength of the unobserved confounder U in treatment assignment.
u_strength_yfloat, default=0.0: Strength of the unobserved confounder U in the outcome.
propensity_sharpnessfloat, default=1.0: Scales the X-driven treatment score to adjust positivity difficulty.
seedint, optional: Random seed for reproducibility.

Attributes

rngnumpy.random.Generator: Internal RNG seeded from seed.

Notes

Oracle outputs are reported on the natural outcome scale:

m is the treatment propensity marginalized over latent noise.
g0 and g1 are mean potential outcomes on the observed outcome scale.
cate is always g1 - g0 on that same natural scale, even when the structural treatment effect is specified on a link scale such as log-odds or log-mean.

Examples

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator

Sections

ParametersAttributesNotesExamples

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.theta

theta

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.theta

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.tau

tau

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.tau

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_y

beta_y

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_y

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_d

beta_d

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_d

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_y

g_y

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_y

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_d

g_d

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_d

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_y

alpha_y

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_y

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_d

alpha_d

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_d

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.sigma_y

sigma_y

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.sigma_y

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.outcome_type

outcome_type

Value: 'continuous'

‘continuous’

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.outcome_type

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.confounder_specs

confounder_specs

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.confounder_specs

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.k

k

Value: 5

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.k

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.x_sampler

x_sampler

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.x_sampler

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.use_copula

use_copula

Value: False

False

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.use_copula

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.copula_corr

copula_corr

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.copula_corr

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.target_d_rate

target_d_rate

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.target_d_rate

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_d

u_strength_d

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_d

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_y

u_strength_y

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_y

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.propensity_sharpness

propensity_sharpness

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.propensity_sharpness

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.score_bounding

score_bounding

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.score_bounding

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_zi

alpha_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.alpha_zi

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_zi

beta_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.beta_zi

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_zi

g_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.g_zi

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_zi

u_strength_zi

Value: 0.0

0.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.u_strength_zi

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.tau_zi

tau_zi

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.tau_zi

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.pos_dist

pos_dist

Value: 'gamma'

‘gamma’

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.pos_dist

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.gamma_shape

gamma_shape

Value: 2.0

2.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.gamma_shape

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.lognormal_sigma

lognormal_sigma

Value: 1.0

1.0

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.lognormal_sigma

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.include_oracle

include_oracle

Value: True

True

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.include_oracle

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.seed

seed

Value: None

None

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.seed

Link to this symbol

attribute

causalis.dgp.causaldata.base.CausalDatasetGenerator.rng

rng

Value: 'field(...)'

‘field(…)’

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.rng

Link to this symbol

method

causalis.dgp.causaldata.base.CausalDatasetGenerator.__post_init__

__post_init__

Initialize RNG and validate configuration.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.__post_init__

Link to this symbol

method

causalis.dgp.causaldata.base.CausalDatasetGenerator.generate

generate

Draw a synthetic dataset of size n.

Parameters

nint: Number of samples to generate.
Unumpy.ndarray, optional: Unobserved confounder. If None, generated from N(0,1).

Returns

pandas.DataFrame

The generated dataset with outcome ‘y’, treatment ‘d’, confounders, and oracle ground-truth columns.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.generate

Sections

ParametersReturns

Link to this symbol

method

causalis.dgp.causaldata.base.CausalDatasetGenerator.to_causal_data

to_causal_data

Generate a dataset and convert it to a CausalData object.

Parameters

nint: Number of samples to generate.
confoundersstr or list of str, optional: List of confounder column names to include. If None, automatically detects numeric confounders.

Returns

CausalData

A CausalData object containing the generated dataset.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.to_causal_data

Sections

ParametersReturns

Link to this symbol

method

causalis.dgp.causaldata.base.CausalDatasetGenerator.oracle_nuisance

oracle_nuisance

Return nuisance functions (m(x), g0(x), g1(x)) compatible with IRM.

Parameters

num_quadint, default=21: Number of quadrature points for marginalizing over U.

Returns

dict

Dictionary of callables mapping X to nuisance values.

Canonical target

causalis.dgp.causaldata.base.CausalDatasetGenerator.oracle_nuisance

Sections

ParametersReturns

Link to this symbol