API ReferenceEntry

cuped

cuped

Reference details for cuped in causalis.scenarios.

cuped

Modules
Classes
  • CUPEDModel – CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
Functions
CUPEDModel

CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:

The reported effect is the coefficient on D, with robust covariance as requested. This specification ensures the coefficient on D is the ATE/ITT even if the treatment effect is heterogeneous with respect to covariates. This is broader than canonical single-theta CUPED (Y - theta*(X - mean(X))).

Parameters
  • cov_type (str) – Covariance estimator passed to statsmodels (e.g., "nonrobust", "HC0", "HC1", "HC2", "HC3"). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here).
  • alpha (float) – Significance level for confidence intervals.
  • strict_binary_treatment (bool) – If True, require treatment to be binary {0,1}.
  • use_t (bool | None) – If bool, passed to statsmodels .fit(..., use_t=use_t) directly. If None, automatic policy is used: for robust HC* covariances, use_t=True when n < use_t_auto_n_threshold, else False. For non-robust covariance, use_t=True.
  • use_t_auto_n_threshold (int) – Sample-size threshold for automatic use_t selection when use_t=None and covariance is HC* robust.
  • relative_ci_method (('delta_nocov', 'bootstrap')) – Method for relative CI of 100 * tau / mu_c.
  • "delta_nocov": delta method using robust Var(tau) and Var(mu_c) while setting Cov(tau, mu_c)=0 (safe fallback without unsupported hybrid IF covariance).
  • "bootstrap": percentile bootstrap CI on the relative effect.
  • relative_ci_bootstrap_draws (int) – Number of bootstrap resamples used when relative_ci_method="bootstrap".
  • relative_ci_bootstrap_seed (int | None) – RNG seed used for bootstrap relative CI.
  • covariate_variance_min (float) – Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting.
  • condition_number_warn_threshold (float) – Trigger diagnostics signal when the design matrix condition number exceeds this threshold.
  • run_regression_checks (bool) – Whether to compute regression diagnostics payload during fit().
  • check_action (('ignore', 'raise')) – Action used when a diagnostics threshold is violated.
  • raise_on_yellow (bool) – When check_action="raise", also raise on YELLOW assumption flags.
  • corr_near_one_tol (float) – Correlation tolerance used to mark near-duplicate centered covariates.
  • vif_warn_threshold (float) – VIF threshold that triggers a diagnostics signal.
  • winsor_q (float | None) – Quantile used for winsor sensitivity refit. Set None to disable.
  • tiny_one_minus_h_tol (float) – Threshold for flagging near-degenerate 1 - leverage terms in HC2/HC3.
Notes
  • Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
  • Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
  • The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.
Functions
  • assumptions_table – Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
  • estimate – Return the adjusted ATE/ITT estimate and inference.
  • fit – Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
  • summary_dict – Convenience JSON/logging output.
adjustment
alpha
assumptions_table

Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.

center_covariates
centering_scope
check_action
condition_number_warn_threshold
corr_near_one_tol
cov_type
covariate_variance_min
estimate

Return the adjusted ATE/ITT estimate and inference.

Parameters
  • alpha (float) – Override the instance significance level for confidence intervals.
  • diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.
Returns
  • CausalEstimate – A results object containing effect estimates and inference.
fit

Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.

Parameters
  • data (CausalData) – Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates).
  • covariates ((Sequence[str], required)) – Explicit subset of data_contracts.confounders_names to use as CUPED covariates. Pass [] for an unadjusted (naive) fit.
  • run_checks (bool | None) – Override whether regression checks are computed in this fit call. If None, uses self.run_regression_checks.
Returns
Raises
  • ValueError – If covariates is omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outside data_contracts.confounders_names, treatment is not binary when strict_binary_treatment=True, or the design matrix is rank deficient.
raise_on_yellow
relative_ci_bootstrap_draws
relative_ci_bootstrap_seed
relative_ci_method
run_regression_checks
strict_binary_treatment
summary_dict

Convenience JSON/logging output.

Parameters
  • alpha (float) – Override the instance significance level for confidence intervals.
Returns
  • dict – Dictionary with estimates, inference, and diagnostics.
tiny_one_minus_h_tol
use_t
use_t_auto_n_threshold
vif_warn_threshold
winsor_q
cuped_forest_plot

Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

Parameters
  • estimate_with_cuped (CausalEstimate) – Effect estimated with CUPED adjustment.
  • estimate_without_cuped (CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function uses estimate_with_cuped.diagnostic_data.ate_naive and estimate_with_cuped.diagnostic_data.se_naive to build a normal-approx CI.
dgp
Functions
generate_cuped_tweedie_26

Gold standard Tweedie-like DGP with mixed marginals and structured HTE. Features many zeros and a heavy right tail. Includes two pre-period covariates by default: 'y_pre' and 'y_pre_2'. Wrapper for make_tweedie().

Parameters
  • n (int) – Number of samples to generate.
  • seed (int) – Random seed.
  • add_pre (bool) – Whether to add pre-period covariates.
  • pre_name (str) – Name of the first pre-period covariate column.
  • pre_name_2 (str) – Name of the second pre-period covariate column. Defaults to f"{pre_name}_2".
  • pre_target_corr (float) – Target correlation between the first pre covariate and post-outcome y in control group.
  • pre_target_corr_2 (float) – Target correlation for the second pre covariate. Defaults to a moderate value based on pre_target_corr to reduce collinearity.
  • pre_spec (PreCorrSpec) – Detailed specification for pre-period calibration (transform, method, etc.).
  • include_oracle (bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc.
  • return_causal_data (bool) – Whether to return a CausalData object.
  • theta_log (float) – The log-uplift theta parameter for the treatment effect.
Returns
make_cuped_binary_26

Binary CUPED benchmark with richer confounders and structured HTE. Includes a calibrated pre-period covariate 'y_pre' by default. Wrapper for generate_cuped_binary().

Parameters
  • n (int) – Number of samples to generate.
  • seed (int) – Random seed.
  • add_pre (bool) – Whether to add a pre-period covariate 'y_pre'.
  • pre_name (str) – Name of the pre-period covariate column.
  • pre_target_corr (float) – Target correlation between y_pre and post-outcome y in the control group.
  • pre_spec (PreCorrSpec) – Detailed specification for pre-period calibration (transform, method, etc.).
  • include_oracle (bool) – Whether to include oracle columns like 'cate', 'g0', and 'g1'.
  • return_causal_data (bool) – Whether to return a CausalData object.
  • theta_logit (float) – Baseline log-odds uplift scale for heterogeneous treatment effects.
Returns
diagnostics
Modules
Functions
FLAG_GREEN
FLAG_RED
FLAG_YELLOW
assumption_ate_gap

Check adjusted-vs-naive ATE gap relative to naive SE.

assumption_condition_number

Check global collinearity via condition number.

assumption_cooks

Check Cook's distance influence diagnostics.

assumption_design_rank

Check that the design matrix is full rank.

assumption_hc23_stability

Check HC2/HC3 stability when leverage terms approach one.

assumption_leverage

Check leverage concentration.

assumption_near_duplicates

Check near-duplicate centered covariate pairs.

assumption_residual_tails

Check residual extremes using max standardized residual only.

assumption_vif

Check VIF from centered main-effect covariates.

assumption_winsor_sensitivity

Check sensitivity of adjusted ATE to winsorized-outcome refit.

cuped_forest_plot

Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

Parameters
  • estimate_with_cuped (CausalEstimate) – Effect estimated with CUPED adjustment.
  • estimate_without_cuped (CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function uses estimate_with_cuped.diagnostic_data.ate_naive and estimate_with_cuped.diagnostic_data.se_naive to build a normal-approx CI.
design_matrix_checks

Return rank/conditioning diagnostics for a numeric design matrix.

forest_plot
Functions
  • cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
cuped_forest_plot

Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.

Parameters
  • estimate_with_cuped (CausalEstimate) – Effect estimated with CUPED adjustment.
  • estimate_without_cuped (CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function uses estimate_with_cuped.diagnostic_data.ate_naive and estimate_with_cuped.diagnostic_data.se_naive to build a normal-approx CI.
overall_assumption_flag

Return overall GREEN/YELLOW/RED status from an assumptions table.

regression_assumption_rows_from_checks

Run all CUPED regression assumption tests and return row payloads.

regression_assumptions_table_from_checks

Return a table of GREEN/YELLOW/RED assumption flags from checks payload.

regression_assumptions_table_from_data

Fit CUPED on CausalData and return the assumptions flag table.

regression_assumptions_table_from_diagnostic_data

Build assumption table from CUPEDDiagnosticData payload.

regression_assumptions_table_from_estimate

Build assumptions table from a CUPED estimate.

Supports both call styles:

  1. regression_assumptions_table_from_estimate(estimate, ...)
  2. regression_assumptions_table_from_estimate(data, estimate, ...)
regression_checks
Functions
FLAG_COLOR
FLAG_GREEN
FLAG_LEVEL
FLAG_RED
FLAG_YELLOW
assumption_ate_gap

Check adjusted-vs-naive ATE gap relative to naive SE.

assumption_condition_number

Check global collinearity via condition number.

assumption_cooks

Check Cook's distance influence diagnostics.

assumption_design_rank

Check that the design matrix is full rank.

assumption_hc23_stability

Check HC2/HC3 stability when leverage terms approach one.

assumption_leverage

Check leverage concentration.

assumption_near_duplicates

Check near-duplicate centered covariate pairs.

assumption_residual_tails

Check residual extremes using max standardized residual only.

assumption_vif

Check VIF from centered main-effect covariates.

assumption_winsor_sensitivity

Check sensitivity of adjusted ATE to winsorized-outcome refit.

design_matrix_checks

Return rank/conditioning diagnostics for a numeric design matrix.

leverage_and_cooks

Compute leverage, Cook's distance, and internally studentized residuals.

near_duplicate_corr_pairs

Find pairs with absolute correlation very close to one.

overall_assumption_flag

Return overall GREEN/YELLOW/RED status from an assumptions table.

regression_assumption_rows_from_checks

Run all CUPED regression assumption tests and return row payloads.

regression_assumptions_table_from_checks

Return a table of GREEN/YELLOW/RED assumption flags from checks payload.

regression_assumptions_table_from_data

Fit CUPED on CausalData and return the assumptions flag table.

regression_assumptions_table_from_diagnostic_data

Build assumption table from CUPEDDiagnosticData payload.

regression_assumptions_table_from_estimate

Build assumptions table from a CUPED estimate.

Supports both call styles:

  1. regression_assumptions_table_from_estimate(estimate, ...)
  2. regression_assumptions_table_from_estimate(data, estimate, ...)
run_regression_checks

Build a compact payload with design, residual, and influence diagnostics.

style_regression_assumptions_table

Return pandas Styler with colored flag cells for notebook display.

vif_from_corr

Approximate VIF from inverse correlation matrix of standardized covariates.

winsor_fit_tau

Refit OLS on winsorized outcome and return treatment coefficient.

run_regression_checks

Build a compact payload with design, residual, and influence diagnostics.

style_regression_assumptions_table

Return pandas Styler with colored flag cells for notebook display.

model
Classes
  • CUPEDModel – CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
CUPEDModel

CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.

Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:

The reported effect is the coefficient on D, with robust covariance as requested. This specification ensures the coefficient on D is the ATE/ITT even if the treatment effect is heterogeneous with respect to covariates. This is broader than canonical single-theta CUPED (Y - theta*(X - mean(X))).

Parameters
  • cov_type (str) – Covariance estimator passed to statsmodels (e.g., "nonrobust", "HC0", "HC1", "HC2", "HC3"). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here).
  • alpha (float) – Significance level for confidence intervals.
  • strict_binary_treatment (bool) – If True, require treatment to be binary {0,1}.
  • use_t (bool | None) – If bool, passed to statsmodels .fit(..., use_t=use_t) directly. If None, automatic policy is used: for robust HC* covariances, use_t=True when n < use_t_auto_n_threshold, else False. For non-robust covariance, use_t=True.
  • use_t_auto_n_threshold (int) – Sample-size threshold for automatic use_t selection when use_t=None and covariance is HC* robust.
  • relative_ci_method (('delta_nocov', 'bootstrap')) – Method for relative CI of 100 * tau / mu_c.
  • "delta_nocov": delta method using robust Var(tau) and Var(mu_c) while setting Cov(tau, mu_c)=0 (safe fallback without unsupported hybrid IF covariance).
  • "bootstrap": percentile bootstrap CI on the relative effect.
  • relative_ci_bootstrap_draws (int) – Number of bootstrap resamples used when relative_ci_method="bootstrap".
  • relative_ci_bootstrap_seed (int | None) – RNG seed used for bootstrap relative CI.
  • covariate_variance_min (float) – Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting.
  • condition_number_warn_threshold (float) – Trigger diagnostics signal when the design matrix condition number exceeds this threshold.
  • run_regression_checks (bool) – Whether to compute regression diagnostics payload during fit().
  • check_action (('ignore', 'raise')) – Action used when a diagnostics threshold is violated.
  • raise_on_yellow (bool) – When check_action="raise", also raise on YELLOW assumption flags.
  • corr_near_one_tol (float) – Correlation tolerance used to mark near-duplicate centered covariates.
  • vif_warn_threshold (float) – VIF threshold that triggers a diagnostics signal.
  • winsor_q (float | None) – Quantile used for winsor sensitivity refit. Set None to disable.
  • tiny_one_minus_h_tol (float) – Threshold for flagging near-degenerate 1 - leverage terms in HC2/HC3.
Notes
  • Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
  • Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
  • The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.
Functions
  • assumptions_table – Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
  • estimate – Return the adjusted ATE/ITT estimate and inference.
  • fit – Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
  • summary_dict – Convenience JSON/logging output.
adjustment
alpha
assumptions_table

Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.

center_covariates
centering_scope
check_action
condition_number_warn_threshold
corr_near_one_tol
cov_type
covariate_variance_min
estimate

Return the adjusted ATE/ITT estimate and inference.

Parameters
  • alpha (float) – Override the instance significance level for confidence intervals.
  • diagnostic_data (bool) – Whether to include diagnostic data_contracts in the result.
Returns
  • CausalEstimate – A results object containing effect estimates and inference.
fit

Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.

Parameters
  • data (CausalData) – Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates).
  • covariates ((Sequence[str], required)) – Explicit subset of data_contracts.confounders_names to use as CUPED covariates. Pass [] for an unadjusted (naive) fit.
  • run_checks (bool | None) – Override whether regression checks are computed in this fit call. If None, uses self.run_regression_checks.
Returns
Raises
  • ValueError – If covariates is omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outside data_contracts.confounders_names, treatment is not binary when strict_binary_treatment=True, or the design matrix is rank deficient.
raise_on_yellow
relative_ci_bootstrap_draws
relative_ci_bootstrap_seed
relative_ci_method
run_regression_checks
strict_binary_treatment
summary_dict

Convenience JSON/logging output.

Parameters
  • alpha (float) – Override the instance significance level for confidence intervals.
Returns
  • dict – Dictionary with estimates, inference, and diagnostics.
tiny_one_minus_h_tol
use_t
use_t_auto_n_threshold
vif_warn_threshold
winsor_q
regression_assumptions_table_from_data

Fit CUPED on CausalData and return the assumptions flag table.

regression_assumptions_table_from_estimate

Build assumptions table from a CUPED estimate.

Supports both call styles:

  1. regression_assumptions_table_from_estimate(estimate, ...)
  2. regression_assumptions_table_from_estimate(data, estimate, ...)
style_regression_assumptions_table

Return pandas Styler with colored flag cells for notebook display.