causalis.dgp.causaldata.base.CausalDatasetGeneratorCausalDatasetGenerator
Generate synthetic causal inference datasets with controllable confounding, treatment prevalence, noise, and (optionally) heterogeneous treatment effects.
Data model (high level)
confounders X ∈ R^k are drawn from user-specified distributions.
Binary treatment D is assigned by a logistic model: D ~ Bernoulli( sigmoid(alpha_d + f_d(X) + u_strength_d * U) ), where f_d(X) = (X @ beta_d + g_d(X)) * propensity_sharpness, and U ~ N(0,1) is an optional unobserved confounder.
Outcome Y depends on treatment and confounders with link determined by
outcome_type: outcome_type = “continuous”: Y = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) + ε, ε ~ N(0, sigma_y^2) outcome_type = “binary”: logit P(Y=1|T,X,U) = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) outcome_type = “poisson”: log E[Y|T,X,U] = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) outcome_type = “gamma”: log E[Y|T,X,U] = alpha_y + f_y(X) + u_strength_y * U + T * tau(X) where f_y(X) = X @ beta_y + g_y(X), and tau(X) is either constantthetaor a user function.
Returned columns
y: outcome
d: binary treatment (0/1)
x1..xk (or user-provided names)
m: true propensity P(T=1 | X) marginalized over U
m_obs: realized propensity P(T=1 | X, U)
tau_link: tau(X) on the structural (link) scale
g0: E[Y | X, T=0] on the natural outcome scale marginalized over U .,9
g1: E[Y | X, T=1] on the natural outcome scale marginalized over U
cate: g1 - g0 (conditional average treatment effect on the natural outcome scale)
Notes on effect scale:
For “continuous”,
theta(or tau(X)) is an additive mean difference, sotau_link == cate.For “binary”, tau acts on the log-odds scale.
cateis reported as a risk difference.For “poisson” and “gamma”, tau acts on the log-mean scale.
cateis reported on the mean scale.
Parameters
- thetafloat, default=1.0
Constant treatment effect used if
tauis None.- taucallable, optional
Function tau(X) -> array-like shape (n,) for heterogeneous effects.
- beta_yarray-like, optional
Linear coefficients of confounders in the outcome baseline f_y(X).
- beta_darray-like, optional
Linear coefficients of confounders in the treatment score f_d(X).
- g_ycallable, optional
Nonlinear/additive function g_y(X) -> (n,) added to the outcome baseline.
- g_dcallable, optional
Nonlinear/additive function g_d(X) -> (n,) added to the treatment score.
- alpha_yfloat, default=0.0
Outcome intercept (natural scale for continuous; log-odds for binary; log-mean for Poisson/Gamma).
- alpha_dfloat, default=0.0
Treatment intercept (log-odds). If
target_d_rateis set,alpha_dis auto-calibrated.- sigma_yfloat, default=1.0
Std. dev. of the Gaussian noise for continuous outcomes.
- outcome_type{“continuous”, “binary”, “poisson”, “gamma”, “tweedie”}, default=”continuous”
Outcome family and link as defined above.
- confounder_specslist of dict, optional
Schema for generating confounders. See
_gaussian_copulafor details.- kint, default=5
Number of confounders when
confounder_specsis None. Defaults to independent N(0,1).- x_samplercallable, optional
Custom sampler (n, k, seed) -> X ndarray of shape (n,k). Overrides
confounder_specs.- use_copulabool, default=False
If True and
confounder_specsprovided, use Gaussian copula for X.- copula_corrarray-like, optional
Correlation matrix for copula.
- target_d_ratefloat, optional
Target treatment prevalence (propensity mean). Calibrates
alpha_d.- u_strength_dfloat, default=0.0
Strength of the unobserved confounder U in treatment assignment.
- u_strength_yfloat, default=0.0
Strength of the unobserved confounder U in the outcome.
- propensity_sharpnessfloat, default=1.0
Scales the X-driven treatment score to adjust positivity difficulty.
- seedint, optional
Random seed for reproducibility.
Attributes
- rngnumpy.random.Generator
Internal RNG seeded from
seed.
Notes
Oracle outputs are reported on the natural outcome scale:
mis the treatment propensity marginalized over latent noise.g0andg1are mean potential outcomes on the observed outcome scale.cateis alwaysg1 - g0on that same natural scale, even when the structural treatment effect is specified on a link scale such as log-odds or log-mean.
Examples
Canonical target
causalis.dgp.causaldata.base.CausalDatasetGenerator
Sections