function
causalis.scenarios.unconfoundedness.refutation.score.score_validation.run_score_diagnosticsrun_score_diagnostics
Run orthogonality and influence diagnostics for ATE or ATTE scores.
The main object is the per-observation score contribution. For ATE, this
diagnostic uses
.. math::
\hat\psi_i =
w_i(\hat g_1(X_i) - \hat g_0(X_i))
+ ar w_i
\left[
(Y_i - \hat g_1(X_i))
rac{D_i}{\hat m_i} - (Y_i - \hat g_0(X_i)) rac{1-D_i}{1-\hat m_i}
ight] - \hat heta.
Good score behavior means:
- the empirical score average is close to zero,
- finite-basis derivatives with respect to nuisance parts are small,
- the influence distribution is not driven by a tiny number of very large
:math:`|\hat\psi_i|`.
Parameters
----------
data : CausalData
Dataset used to fit the estimator.
estimate : CausalEstimate
Effect estimate with ``diagnostic_data`` containing nuisance predictions
and optionally cached score arrays.
trimming_threshold : float, optional
Propensity clipping threshold. If omitted, the value is inferred from
diagnostic or model metadata.
n_basis_funcs : int, optional
Number of simple basis functions used in orthogonality checks. Defaults
to one intercept plus all available confounders.
return_summary : bool, default True
Include a compact summary table in the returned payload.
Returns
-------
Dict[str, Any]
Diagnostic report with orthogonality checks, influence summaries,
optional out-of-sample tests, and a summary table.
Raises
------
ValueError
If required diagnostic arrays are missing or have incompatible shapes.
Examples
--------
>>> from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
>>> from causalis.dgp import obs_linear_26_dataset
>>> from causalis.scenarios.unconfoundedness.model import IRM
>>> data = obs_linear_26_dataset(
... n=1000,
... seed=3141,
... include_oracle=False,
... return_causal_data=True,
... )
>>> irm = IRM(
... data=data,
... ml_g=RandomForestRegressor(
... n_estimators=200,
... max_depth=6,
... min_samples_leaf=5,
... random_state=3141,
... ),
... ml_m=RandomForestClassifier(
... n_estimators=200,
... max_depth=6,
... min_samples_leaf=5,
... random_state=3141,
... ),
... n_folds=3,
... random_state=3141,
... )
>>> estimate = irm.fit().estimate(score="ATE")
>>> report = run_score_diagnostics(data, estimate)
>>> report["summary"] # doctest: +SKIP
>>> report["influence"]["top_influential"].head() # doctest: +SKIP
Canonical target
causalis.scenarios.unconfoundedness.refutation.score.score_validation.run_score_diagnostics