Causalis

`cuped`

Modules

Classes

CUPEDModel – CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

Functions

cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
regression_assumptions_table_from_data – Fit CUPED on CausalData and return the assumptions flag table.
regression_assumptions_table_from_estimate – Build assumptions table from a CUPED estimate.
style_regression_assumptions_table – Return pandas Styler with colored flag cells for notebook display.

`CUPEDModel`

CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:

The reported effect is the coefficient on D, with robust covariance as requested. This specification ensures the coefficient on D is the ATE/ITT even if the treatment effect is heterogeneous with respect to covariates. This is broader than canonical single-theta CUPED (Y - theta*(X - mean(X))).

Parameters

cov_type (str) – Covariance estimator passed to statsmodels (e.g., "nonrobust", "HC0", "HC1", "HC2", "HC3"). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here).
alpha (float) – Significance level for confidence intervals.
strict_binary_treatment (bool) – If True, require treatment to be binary {0,1}.
use_t (bool | None) – If bool, passed to statsmodels .fit(..., use_t=use_t) directly. If None, automatic policy is used: for robust HC* covariances, use_t=True when n < use_t_auto_n_threshold, else False. For non-robust covariance, use_t=True.
use_t_auto_n_threshold (int) – Sample-size threshold for automatic use_t selection when use_t=None and covariance is HC* robust.
relative_ci_method (('delta_nocov', 'bootstrap')) – Method for relative CI of 100 * tau / mu_c.
"delta_nocov": delta method using robust Var(tau) and Var(mu_c) while setting Cov(tau, mu_c)=0 (safe fallback without unsupported hybrid IF covariance).
"bootstrap": percentile bootstrap CI on the relative effect.
relative_ci_bootstrap_draws (int) – Number of bootstrap resamples used when relative_ci_method="bootstrap".
relative_ci_bootstrap_seed (int | None) – RNG seed used for bootstrap relative CI.
covariate_variance_min (float) – Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting.
condition_number_warn_threshold (float) – Trigger diagnostics signal when the design matrix condition number exceeds this threshold.
run_regression_checks (bool) – Whether to compute regression diagnostics payload during fit().
check_action (('ignore', 'raise')) – Action used when a diagnostics threshold is violated.
raise_on_yellow (bool) – When check_action="raise", also raise on YELLOW assumption flags.
corr_near_one_tol (float) – Correlation tolerance used to mark near-duplicate centered covariates.
vif_warn_threshold (float) – VIF threshold that triggers a diagnostics signal.
winsor_q (float | None) – Quantile used for winsor sensitivity refit. Set None to disable.
tiny_one_minus_h_tol (float) – Threshold for flagging near-degenerate 1 - leverage terms in HC2/HC3.

Notes

Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.

Functions

assumptions_table – Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
estimate – Return the adjusted ATE/ITT estimate and inference.
fit – Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
summary_dict – Convenience JSON/logging output.

`adjustment`

`alpha`

`assumptions_table`

Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.

`center_covariates`

`centering_scope`

`check_action`

`condition_number_warn_threshold`

`corr_near_one_tol`

`cov_type`

`covariate_variance_min`

`estimate`

Return the adjusted ATE/ITT estimate and inference.

Parameters

alpha (float) – Override the instance significance level for confidence intervals.
diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.

Returns

CausalEstimate – A results object containing effect estimates and inference.

`fit`

Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.

Parameters

data (CausalData) – Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates).
covariates ((Sequence[str], required)) – Explicit subset of data_contracts.confounders_names to use as CUPED covariates. Pass [] for an unadjusted (naive) fit.
run_checks (bool | None) – Override whether regression checks are computed in this fit call. If None, uses self.run_regression_checks.

Returns

CUPEDModel – Fitted estimator.

Raises

ValueError – If covariates is omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outside data_contracts.confounders_names, treatment is not binary when strict_binary_treatment=True, or the design matrix is rank deficient.

`raise_on_yellow`

`relative_ci_bootstrap_draws`

`relative_ci_bootstrap_seed`

`relative_ci_method`

`run_regression_checks`

`strict_binary_treatment`

`summary_dict`

Convenience JSON/logging output.

Parameters

alpha (float) – Override the instance significance level for confidence intervals.

Returns

dict – Dictionary with estimates, inference, and diagnostics.

`tiny_one_minus_h_tol`

`use_t`

`use_t_auto_n_threshold`

`vif_warn_threshold`

`winsor_q`

`cuped_forest_plot`

Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

Parameters

estimate_with_cuped (CausalEstimate) – Effect estimated with CUPED adjustment.
estimate_without_cuped (CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function uses estimate_with_cuped.diagnostic_data.ate_naive and estimate_with_cuped.diagnostic_data.se_naive to build a normal-approx CI.

`dgp`

Functions

generate_cuped_tweedie_26 – Gold standard Tweedie-like DGP with mixed marginals and structured HTE.
make_cuped_binary_26 – Binary CUPED benchmark with richer confounders and structured HTE.

`generate_cuped_tweedie_26`

Gold standard Tweedie-like DGP with mixed marginals and structured HTE. Features many zeros and a heavy right tail. Includes two pre-period covariates by default: 'y_pre' and 'y_pre_2'. Wrapper for make_tweedie().

Parameters

n (int) – Number of samples to generate.
seed (int) – Random seed.
add_pre (bool) – Whether to add pre-period covariates.
pre_name (str) – Name of the first pre-period covariate column.
pre_name_2 (str) – Name of the second pre-period covariate column. Defaults to f"{pre_name}_2".
pre_target_corr (float) – Target correlation between the first pre covariate and post-outcome y in control group.
pre_target_corr_2 (float) – Target correlation for the second pre covariate. Defaults to a moderate value based on pre_target_corr to reduce collinearity.
pre_spec (PreCorrSpec) – Detailed specification for pre-period calibration (transform, method, etc.).
include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc.
return_causal_data (bool) – Whether to return a CausalData object.
theta_log (float) – The log-uplift theta parameter for the treatment effect.

Returns

DataFrame or CausalData –

`make_cuped_binary_26`

Binary CUPED benchmark with richer confounders and structured HTE. Includes a calibrated pre-period covariate 'y_pre' by default. Wrapper for generate_cuped_binary().

Parameters

n (int) – Number of samples to generate.
seed (int) – Random seed.
add_pre (bool) – Whether to add a pre-period covariate 'y_pre'.
pre_name (str) – Name of the pre-period covariate column.
pre_target_corr (float) – Target correlation between y_pre and post-outcome y in the control group.
pre_spec (PreCorrSpec) – Detailed specification for pre-period calibration (transform, method, etc.).
include_oracle (bool) – Whether to include oracle columns like 'cate', 'g0', and 'g1'.
return_causal_data (bool) – Whether to return a CausalData object.
theta_logit (float) – Baseline log-odds uplift scale for heterogeneous treatment effects.

Returns

DataFrame or CausalData –

`diagnostics`

Modules

forest_plot –
regression_checks –

Functions

assumption_ate_gap – Check adjusted-vs-naive ATE gap relative to naive SE.
assumption_condition_number – Check global collinearity via condition number.
assumption_cooks – Check Cook's distance influence diagnostics.
assumption_design_rank – Check that the design matrix is full rank.
assumption_hc23_stability – Check HC2/HC3 stability when leverage terms approach one.
assumption_leverage – Check leverage concentration.
assumption_near_duplicates – Check near-duplicate centered covariate pairs.
assumption_residual_tails – Check residual extremes using max standardized residual only.
assumption_vif – Check VIF from centered main-effect covariates.
assumption_winsor_sensitivity – Check sensitivity of adjusted ATE to winsorized-outcome refit.
cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
design_matrix_checks – Return rank/conditioning diagnostics for a numeric design matrix.
overall_assumption_flag – Return overall GREEN/YELLOW/RED status from an assumptions table.
regression_assumption_rows_from_checks – Run all CUPED regression assumption tests and return row payloads.
regression_assumptions_table_from_checks – Return a table of GREEN/YELLOW/RED assumption flags from checks payload.
regression_assumptions_table_from_data – Fit CUPED on CausalData and return the assumptions flag table.
regression_assumptions_table_from_diagnostic_data – Build assumption table from CUPEDDiagnosticData payload.
regression_assumptions_table_from_estimate – Build assumptions table from a CUPED estimate.
run_regression_checks – Build a compact payload with design, residual, and influence diagnostics.
style_regression_assumptions_table – Return pandas Styler with colored flag cells for notebook display.

`FLAG_GREEN`

`FLAG_RED`

`FLAG_YELLOW`

`assumption_ate_gap`

Check adjusted-vs-naive ATE gap relative to naive SE.

`assumption_condition_number`

Check global collinearity via condition number.

`assumption_cooks`

Check Cook's distance influence diagnostics.

`assumption_design_rank`

Check that the design matrix is full rank.

`assumption_hc23_stability`

Check HC2/HC3 stability when leverage terms approach one.

`assumption_leverage`

Check leverage concentration.

`assumption_near_duplicates`

Check near-duplicate centered covariate pairs.

`assumption_residual_tails`

Check residual extremes using max standardized residual only.

`assumption_vif`

Check VIF from centered main-effect covariates.

`assumption_winsor_sensitivity`

Check sensitivity of adjusted ATE to winsorized-outcome refit.

`cuped_forest_plot`

Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

Parameters

estimate_with_cuped (CausalEstimate) – Effect estimated with CUPED adjustment.
estimate_without_cuped (CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function uses estimate_with_cuped.diagnostic_data.ate_naive and estimate_with_cuped.diagnostic_data.se_naive to build a normal-approx CI.

`design_matrix_checks`

Return rank/conditioning diagnostics for a numeric design matrix.

`forest_plot`

Functions

cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

`cuped_forest_plot`

Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

Parameters

estimate_with_cuped (CausalEstimate) – Effect estimated with CUPED adjustment.
estimate_without_cuped (CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function uses estimate_with_cuped.diagnostic_data.ate_naive and estimate_with_cuped.diagnostic_data.se_naive to build a normal-approx CI.

`overall_assumption_flag`

Return overall GREEN/YELLOW/RED status from an assumptions table.

`regression_assumption_rows_from_checks`

Run all CUPED regression assumption tests and return row payloads.

`regression_assumptions_table_from_checks`

Return a table of GREEN/YELLOW/RED assumption flags from checks payload.

`regression_assumptions_table_from_data`

Fit CUPED on CausalData and return the assumptions flag table.

`regression_assumptions_table_from_diagnostic_data`

Build assumption table from CUPEDDiagnosticData payload.

`regression_assumptions_table_from_estimate`

Build assumptions table from a CUPED estimate.

Supports both call styles:

regression_assumptions_table_from_estimate(estimate, ...)
regression_assumptions_table_from_estimate(data, estimate, ...)

`regression_checks`

Functions

assumption_ate_gap – Check adjusted-vs-naive ATE gap relative to naive SE.
assumption_condition_number – Check global collinearity via condition number.
assumption_cooks – Check Cook's distance influence diagnostics.
assumption_design_rank – Check that the design matrix is full rank.
assumption_hc23_stability – Check HC2/HC3 stability when leverage terms approach one.
assumption_leverage – Check leverage concentration.
assumption_near_duplicates – Check near-duplicate centered covariate pairs.
assumption_residual_tails – Check residual extremes using max standardized residual only.
assumption_vif – Check VIF from centered main-effect covariates.
assumption_winsor_sensitivity – Check sensitivity of adjusted ATE to winsorized-outcome refit.
design_matrix_checks – Return rank/conditioning diagnostics for a numeric design matrix.
leverage_and_cooks – Compute leverage, Cook's distance, and internally studentized residuals.
near_duplicate_corr_pairs – Find pairs with absolute correlation very close to one.
overall_assumption_flag – Return overall GREEN/YELLOW/RED status from an assumptions table.
regression_assumption_rows_from_checks – Run all CUPED regression assumption tests and return row payloads.
regression_assumptions_table_from_checks – Return a table of GREEN/YELLOW/RED assumption flags from checks payload.
regression_assumptions_table_from_data – Fit CUPED on CausalData and return the assumptions flag table.
regression_assumptions_table_from_diagnostic_data – Build assumption table from CUPEDDiagnosticData payload.
regression_assumptions_table_from_estimate – Build assumptions table from a CUPED estimate.
run_regression_checks – Build a compact payload with design, residual, and influence diagnostics.
style_regression_assumptions_table – Return pandas Styler with colored flag cells for notebook display.
vif_from_corr – Approximate VIF from inverse correlation matrix of standardized covariates.
winsor_fit_tau – Refit OLS on winsorized outcome and return treatment coefficient.

`FLAG_COLOR`

`FLAG_GREEN`

`FLAG_LEVEL`

`FLAG_RED`

`FLAG_YELLOW`

`assumption_ate_gap`

Check adjusted-vs-naive ATE gap relative to naive SE.

`assumption_condition_number`

Check global collinearity via condition number.

`assumption_cooks`

Check Cook's distance influence diagnostics.

`assumption_design_rank`

Check that the design matrix is full rank.

`assumption_hc23_stability`

Check HC2/HC3 stability when leverage terms approach one.

`assumption_leverage`

Check leverage concentration.

`assumption_near_duplicates`

Check near-duplicate centered covariate pairs.

`assumption_residual_tails`

Check residual extremes using max standardized residual only.

`assumption_vif`

Check VIF from centered main-effect covariates.

`assumption_winsor_sensitivity`

Check sensitivity of adjusted ATE to winsorized-outcome refit.

`design_matrix_checks`

Return rank/conditioning diagnostics for a numeric design matrix.

`leverage_and_cooks`

Compute leverage, Cook's distance, and internally studentized residuals.

`near_duplicate_corr_pairs`

Find pairs with absolute correlation very close to one.

`overall_assumption_flag`

Return overall GREEN/YELLOW/RED status from an assumptions table.

`regression_assumption_rows_from_checks`

Run all CUPED regression assumption tests and return row payloads.

`regression_assumptions_table_from_checks`

Return a table of GREEN/YELLOW/RED assumption flags from checks payload.

`regression_assumptions_table_from_data`

Fit CUPED on CausalData and return the assumptions flag table.

`regression_assumptions_table_from_diagnostic_data`

Build assumption table from CUPEDDiagnosticData payload.

`regression_assumptions_table_from_estimate`

Build assumptions table from a CUPED estimate.

Supports both call styles:

regression_assumptions_table_from_estimate(estimate, ...)
regression_assumptions_table_from_estimate(data, estimate, ...)

`run_regression_checks`

Build a compact payload with design, residual, and influence diagnostics.

`style_regression_assumptions_table`

Return pandas Styler with colored flag cells for notebook display.

`vif_from_corr`

Approximate VIF from inverse correlation matrix of standardized covariates.

`winsor_fit_tau`

Refit OLS on winsorized outcome and return treatment coefficient.

`run_regression_checks`

Build a compact payload with design, residual, and influence diagnostics.

`style_regression_assumptions_table`

Return pandas Styler with colored flag cells for notebook display.

`model`

Classes

CUPEDModel – CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

`CUPEDModel`

CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:

Parameters

cov_type (str) – Covariance estimator passed to statsmodels (e.g., "nonrobust", "HC0", "HC1", "HC2", "HC3"). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here).
alpha (float) – Significance level for confidence intervals.
strict_binary_treatment (bool) – If True, require treatment to be binary {0,1}.
use_t (bool | None) – If bool, passed to statsmodels .fit(..., use_t=use_t) directly. If None, automatic policy is used: for robust HC* covariances, use_t=True when n < use_t_auto_n_threshold, else False. For non-robust covariance, use_t=True.
use_t_auto_n_threshold (int) – Sample-size threshold for automatic use_t selection when use_t=None and covariance is HC* robust.
relative_ci_method (('delta_nocov', 'bootstrap')) – Method for relative CI of 100 * tau / mu_c.
"delta_nocov": delta method using robust Var(tau) and Var(mu_c) while setting Cov(tau, mu_c)=0 (safe fallback without unsupported hybrid IF covariance).
"bootstrap": percentile bootstrap CI on the relative effect.
relative_ci_bootstrap_draws (int) – Number of bootstrap resamples used when relative_ci_method="bootstrap".
relative_ci_bootstrap_seed (int | None) – RNG seed used for bootstrap relative CI.
covariate_variance_min (float) – Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting.
condition_number_warn_threshold (float) – Trigger diagnostics signal when the design matrix condition number exceeds this threshold.
run_regression_checks (bool) – Whether to compute regression diagnostics payload during fit().
check_action (('ignore', 'raise')) – Action used when a diagnostics threshold is violated.
raise_on_yellow (bool) – When check_action="raise", also raise on YELLOW assumption flags.
corr_near_one_tol (float) – Correlation tolerance used to mark near-duplicate centered covariates.
vif_warn_threshold (float) – VIF threshold that triggers a diagnostics signal.
winsor_q (float | None) – Quantile used for winsor sensitivity refit. Set None to disable.
tiny_one_minus_h_tol (float) – Threshold for flagging near-degenerate 1 - leverage terms in HC2/HC3.

Notes

Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.

Functions

assumptions_table – Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
estimate – Return the adjusted ATE/ITT estimate and inference.
fit – Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
summary_dict – Convenience JSON/logging output.

`adjustment`

`alpha`

`assumptions_table`

Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.

`center_covariates`

`centering_scope`

`check_action`

`condition_number_warn_threshold`

`corr_near_one_tol`

`cov_type`

`covariate_variance_min`

`estimate`

Return the adjusted ATE/ITT estimate and inference.

Parameters

alpha (float) – Override the instance significance level for confidence intervals.
diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.

Returns

CausalEstimate – A results object containing effect estimates and inference.

`fit`

Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.

Parameters

data (CausalData) – Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates).
covariates ((Sequence[str], required)) – Explicit subset of data_contracts.confounders_names to use as CUPED covariates. Pass [] for an unadjusted (naive) fit.
run_checks (bool | None) – Override whether regression checks are computed in this fit call. If None, uses self.run_regression_checks.

Returns

CUPEDModel – Fitted estimator.

Raises

ValueError – If covariates is omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outside data_contracts.confounders_names, treatment is not binary when strict_binary_treatment=True, or the design matrix is rank deficient.

`raise_on_yellow`

`relative_ci_bootstrap_draws`

`relative_ci_bootstrap_seed`

`relative_ci_method`

`run_regression_checks`

`strict_binary_treatment`

`summary_dict`

Convenience JSON/logging output.

Parameters

alpha (float) – Override the instance significance level for confidence intervals.

Returns

dict – Dictionary with estimates, inference, and diagnostics.

`tiny_one_minus_h_tol`

`use_t`

`use_t_auto_n_threshold`

`vif_warn_threshold`

`winsor_q`

`regression_assumptions_table_from_data`

Fit CUPED on CausalData and return the assumptions flag table.

`regression_assumptions_table_from_estimate`

Build assumptions table from a CUPED estimate.

Supports both call styles:

regression_assumptions_table_from_estimate(estimate, ...)
regression_assumptions_table_from_estimate(data, estimate, ...)

`style_regression_assumptions_table`

Return pandas Styler with colored flag cells for notebook display.