Submodule

causalis.scenarios.unconfoundedness.model

model

Submodule causalis.scenarios.unconfoundedness.model with no child pages and 15 documented members.

Symbol index API members IRM fit estimate diagnostics_

Classes

Jump directly into the documented classes for this page.

1 items

IRMclass

class

causalis.scenarios.unconfoundedness.model.IRM

IRM

Bases: sklearn.base.BaseEstimator

Interactive Regression Model (IRM) with cross-fitting using CausalData.

Parameters

dataCausalData: Data container with outcome, binary treatment (0/1), and confounders.
ml_gestimator: Learner for E[Y|X,D]. If classifier and Y is binary, predict_proba is used; otherwise predict().
ml_mclassifier: Learner for E[D|X] (propensity). Must support predict_proba() or predict() in (0,1).
n_foldsint, default 5: Number of cross-fitting folds.
n_repint, default 1: Number of repetitions of sample splitting. Currently only 1 is supported.
normalize_ipwbool, default False: Whether to normalize IPW terms within the score. Applied to ATE only. For ATTE, normalization is ignored to preserve the canonical ATTE EIF.
trimming_rule{“truncate”}, default “truncate”: Trimming approach for propensity scores.
trimming_thresholdfloat, default 1e-2: Threshold for trimming if rule is “truncate”.
weightsOptional[np.ndarray or Dict], default None: Optional weights. - If array of shape (n,), used as ATE weights (w). Assumed E[w|X] = w. - If dict, can contain ‘weights’ (w) and ‘weights_bar’ (E[w|X]). - For ATTE, computed internally (w=D/P(D=1), w_bar=m(X)/P(D=1)). Note: If weights depend on treatment or outcome, E[w|X] must be provided for correct sensitivity analysis.
relative_baseline_minfloat, default 1e-8: Minimum absolute baseline value used for relative effects. If |mu_c| is below this threshold, relative estimates are set to NaN with a warning.
random_stateOptional[int], default None: Random seed for fold creation.
n_jobsint, default 1: Number of parallel jobs for fold-level cross-fitting. Use -1 to use all available CPUs. Practical guidance: - Start with n_jobs=1 for stable, low-contention defaults. - Increase to n_jobs=2/4/-1 when cross-fitting is the bottleneck. - If nuisance learners are already multithreaded (e.g. CatBoost with thread_count=-1), keep n_jobs=1 or set learner threads to 1 to avoid CPU oversubscription. - On shared machines, prefer a bounded value (for example 2 or 4) instead of -1.
store_diagnosticsbool, default True: Whether to retain raw fit-time arrays and diagnostic-only artifacts on the fitted model. Set to False for a lighter-weight estimator that still supports effect estimation, while only retaining immutable outcome and treatment snapshots. In lightweight mode the estimator no longer keeps the confounder matrix, raw propensities, fold assignments, or compact native feature-importance diagnostics in memory after fit(). When enabled, supported native feature-importance sources are learner feature_importances_, coef_, and CatBoost get_feature_importance().

Examples

Notes

The IRM model targets binary-treatment causal effects under unconfoundedness. Let $W = (Y, D, X)$ with $D \in \{0, 1\}$ and define

g_0(d, x) = \mathbb{E}[Y \mid D=d, X=x], \qquad m_0(x) = \mathbb{P}(D=1 \mid X=x).

Under conditional ignorability and overlap,

(Y(0), Y(1)) \perp D \mid X, \qquad 0 < m_0(X) < 1 \ \text{a.s.},

the target functionals are identified as

\theta_0^{ATE} = \mathbb{E}[g_0(1, X) - g_0(0, X)]

and

\theta_0^{ATTE} = \mathbb{E}[g_0(1, X) - g_0(0, X) \mid D=1].

This implementation cross-fits three nuisance objects: $\hat g_1(x) \approx \mathbb{E}[Y \mid D=1, X=x]$ , $\hat g_0(x) \approx \mathbb{E}[Y \mid D=0, X=x]$ , and $\hat m(x) \approx \mathbb{P}(D=1 \mid X=x)$ . Propensities are trimmed via

\tilde m(x) = \min\{1-\varepsilon, \max(\hat m(x), \varepsilon)\},

where $\varepsilon =$ trimming_threshold.

Estimation solves the sample moment equation

\mathbb{E}_n[\psi_a(W_i; \hat\eta)\theta + \psi_b(W_i; \hat\eta)] = 0,

giving the closed-form estimator

\hat\theta = -\frac{\mathbb{E}_n[\psi_b(W_i; \hat\eta)]} {\mathbb{E}_n[\psi_a(W_i; \hat\eta)]}.

For both ATE and ATTE, the orthogonal score component used here is

\psi_b = w \, (\hat g_1(X) - \hat g_0(X)) + \bar w \left[ (Y - \hat g_1(X)) \frac{D}{\tilde m(X)} - (Y - \hat g_0(X)) \frac{1-D}{1-\tilde m(X)} \right].

The score derivative differs by estimand:

\psi_a = -1 \quad \text{for ATE}, \qquad \psi_a = -w \quad \text{for ATTE}.

The corresponding weights are

w = \bar w = 1 \quad \text{for unweighted ATE},

while for ATTE` this implementation uses normalized treated weights

w_i = \frac{D_i}{\mathbb{E}_n[D]}, \qquad \bar w_i = \frac{\tilde m(X_i)}{\mathbb{E}_n[D]}.

If normalize_ipw=True, the inverse-probability factors $D / \tilde m(X)$ and $(1-D) / (1-\tilde m(X))$ are additionally stabilized by their sample means (a Hajek-style normalization). This option is applied to ATE only; for ATTE it is intentionally ignored to preserve the canonical ATTE efficient influence function used by the estimator.

Initialization

Initialize the estimator and validate configuration options.

Canonical target

causalis.scenarios.unconfoundedness.model.IRM

Sections

ParametersNotesInitializationExamples

model

IRM

fit

estimate

diagnostics_

coef

se

pvalues

summary

orth_signal

gate

gatet

predict_cate

sensitivity_analysis

confint

__repr__

repr