function
causalis.scenarios.unconfoundedness.refutation.overlap.overlap_validation.run_overlap_diagnosticsrun_overlap_diagnostics
Run overlap and calibration diagnostics for an estimated propensity model.
The core overlap object is the propensity score
.. math::
m(X) = \mathbb{P}(D=1 \mid X).
This diagnostic checks whether estimated propensities stay away from the
edges and whether the implied weights are stable. For example, ATE weights
use
.. math::
w_i^{(1)} =
rac{D_i}{m(X_i)}, \qquad w_i^{(0)} = rac{1-D_i}{1-m(X_i)},
so very small :math:`m(X_i)` or very large :math:`m(X_i)` can create large
leverage points. The report combines:
- edge mass near `0` and `1`,
- treated/control separation in propensity space (`KS`, `AUC`),
- effective sample size and tail diagnostics for weights,
- calibration summaries such as `ECE`, recalibration slope, and intercept.
Parameters
----------
data : CausalData
Dataset used to fit the estimator.
estimate : CausalEstimate
Effect estimate with ``diagnostic_data`` containing propensity-related
arrays such as ``m_hat`` and ``d``.
thresholds : dict, optional
Optional threshold overrides keyed by metric name.
n_bins : int, default 10
Number of bins used for calibration summaries.
use_hajek : bool, optional
Whether to evaluate normalized IPW identities. If omitted, the value is
inferred from diagnostic metadata.
return_summary : bool, default True
Include a compact tabular summary in the returned payload.
auc_flip_margin : float, default 0.05
Margin around 0.5 used when flagging reversed treated/control ranking.
Returns
-------
Dict[str, Any]
Diagnostic report containing edge-mass, calibration, weight-stability,
and optional summary tables.
Raises
------
ValueError
If required diagnostic arrays are missing or have incompatible shapes.
Examples
--------
>>> from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
>>> from causalis.dgp import obs_linear_26_dataset
>>> from causalis.scenarios.unconfoundedness.model import IRM
>>> data = obs_linear_26_dataset(
... n=1000,
... seed=3141,
... include_oracle=False,
... return_causal_data=True,
... )
>>> irm = IRM(
... data=data,
... ml_g=RandomForestRegressor(
... n_estimators=200,
... max_depth=6,
... min_samples_leaf=5,
... random_state=3141,
... ),
... ml_m=RandomForestClassifier(
... n_estimators=200,
... max_depth=6,
... min_samples_leaf=5,
... random_state=3141,
... ),
... n_folds=3,
... random_state=3141,
... )
>>> estimate = irm.fit().estimate(score="ATE")
>>> report = run_overlap_diagnostics(data, estimate)
>>> report["summary"] # doctest: +SKIP
>>> report["edge_mass"] # doctest: +SKIP
Canonical target
causalis.scenarios.unconfoundedness.refutation.overlap.overlap_validation.run_overlap_diagnostics