cuped
Modules
- dgp –
- diagnostics –
- model –
Classes
- CUPEDModel – CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
Functions
- cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
- regression_assumptions_table_from_data – Fit CUPED on
CausalDataand return the assumptions flag table. - regression_assumptions_table_from_estimate – Build assumptions table from a CUPED estimate.
- style_regression_assumptions_table – Return pandas Styler with colored flag cells for notebook display.
CUPEDModel
CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:
The reported effect is the coefficient on D, with robust covariance as requested.
This specification ensures the coefficient on D is the ATE/ITT even if the
treatment effect is heterogeneous with respect to covariates.
This is broader than canonical single-theta CUPED (Y - theta*(X - mean(X))).
Parameters
- cov_type (
str) – Covariance estimator passed to statsmodels (e.g., "nonrobust", "HC0", "HC1", "HC2", "HC3"). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here). - alpha (
float) – Significance level for confidence intervals. - strict_binary_treatment (
bool) – If True, require treatment to be binary {0,1}. - use_t (
bool | None) – If bool, passed to statsmodels.fit(..., use_t=use_t)directly. If None, automatic policy is used: for robust HC* covariances,use_t=Truewhenn < use_t_auto_n_threshold, elseFalse. For non-robust covariance,use_t=True. - use_t_auto_n_threshold (
int) – Sample-size threshold for automaticuse_tselection whenuse_t=Noneand covariance is HC* robust. - relative_ci_method (
('delta_nocov', 'bootstrap')) – Method for relative CI of100 * tau / mu_c. - "delta_nocov": delta method using robust
Var(tau)andVar(mu_c)while settingCov(tau, mu_c)=0(safe fallback without unsupported hybrid IF covariance). - "bootstrap": percentile bootstrap CI on the relative effect.
- relative_ci_bootstrap_draws (
int) – Number of bootstrap resamples used whenrelative_ci_method="bootstrap". - relative_ci_bootstrap_seed (
int | None) – RNG seed used for bootstrap relative CI. - covariate_variance_min (
float) – Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting. - condition_number_warn_threshold (
float) – Trigger diagnostics signal when the design matrix condition number exceeds this threshold. - run_regression_checks (
bool) – Whether to compute regression diagnostics payload duringfit(). - check_action (
('ignore', 'raise')) – Action used when a diagnostics threshold is violated. - raise_on_yellow (
bool) – Whencheck_action="raise", also raise on YELLOW assumption flags. - corr_near_one_tol (
float) – Correlation tolerance used to mark near-duplicate centered covariates. - vif_warn_threshold (
float) – VIF threshold that triggers a diagnostics signal. - winsor_q (
float | None) – Quantile used for winsor sensitivity refit. SetNoneto disable. - tiny_one_minus_h_tol (
float) – Threshold for flagging near-degenerate1 - leverageterms in HC2/HC3.
Notes
- Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
- Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
- The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.
Functions
- assumptions_table – Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
- estimate – Return the adjusted ATE/ITT estimate and inference.
- fit – Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
- summary_dict – Convenience JSON/logging output.
adjustment
alpha
assumptions_table
Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
center_covariates
centering_scope
check_action
condition_number_warn_threshold
corr_near_one_tol
cov_type
covariate_variance_min
estimate
Return the adjusted ATE/ITT estimate and inference.
Parameters
- alpha (
float) – Override the instance significance level for confidence intervals. - diagnostic_data (
bool) – Whether to include diagnostic data_contracts in the result.
Returns
CausalEstimate– A results object containing effect estimates and inference.
fit
Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
Parameters
- data (
CausalData) – Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates). - covariates (
(Sequence[str], required)) – Explicit subset ofdata_contracts.confounders_namesto use as CUPED covariates. Pass[]for an unadjusted (naive) fit. - run_checks (
bool | None) – Override whether regression checks are computed in this fit call. IfNone, usesself.run_regression_checks.
Returns
CUPEDModel– Fitted estimator.
Raises
ValueError– Ifcovariatesis omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outsidedata_contracts.confounders_names, treatment is not binary whenstrict_binary_treatment=True, or the design matrix is rank deficient.
raise_on_yellow
relative_ci_bootstrap_draws
relative_ci_bootstrap_seed
relative_ci_method
run_regression_checks
strict_binary_treatment
summary_dict
Convenience JSON/logging output.
Parameters
- alpha (
float) – Override the instance significance level for confidence intervals.
Returns
dict– Dictionary with estimates, inference, and diagnostics.
tiny_one_minus_h_tol
use_t
use_t_auto_n_threshold
vif_warn_threshold
winsor_q
cuped_forest_plot
Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
Parameters
- estimate_with_cuped (
CausalEstimate) – Effect estimated with CUPED adjustment. - estimate_without_cuped (
CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function usesestimate_with_cuped.diagnostic_data.ate_naiveandestimate_with_cuped.diagnostic_data.se_naiveto build a normal-approx CI.
dgp
Functions
- generate_cuped_tweedie_26 – Gold standard Tweedie-like DGP with mixed marginals and structured HTE.
- make_cuped_binary_26 – Binary CUPED benchmark with richer confounders and structured HTE.
generate_cuped_tweedie_26
Gold standard Tweedie-like DGP with mixed marginals and structured HTE. Features many zeros and a heavy right tail. Includes two pre-period covariates by default: 'y_pre' and 'y_pre_2'. Wrapper for make_tweedie().
Parameters
- n (
int) – Number of samples to generate. - seed (
int) – Random seed. - add_pre (
bool) – Whether to add pre-period covariates. - pre_name (
str) – Name of the first pre-period covariate column. - pre_name_2 (
str) – Name of the second pre-period covariate column. Defaults tof"{pre_name}_2". - pre_target_corr (
float) – Target correlation between the first pre covariate and post-outcome y in control group. - pre_target_corr_2 (
float) – Target correlation for the second pre covariate. Defaults to a moderate value based onpre_target_corrto reduce collinearity. - pre_spec (
PreCorrSpec) – Detailed specification for pre-period calibration (transform, method, etc.). - include_oracle (
bool) – Whether to include oracle ground-truth columns like 'cate', 'propensity', etc. - return_causal_data (
bool) – Whether to return a CausalData object. - theta_log (
float) – The log-uplift theta parameter for the treatment effect.
Returns
DataFrame or CausalData–
make_cuped_binary_26
Binary CUPED benchmark with richer confounders and structured HTE. Includes a calibrated pre-period covariate 'y_pre' by default. Wrapper for generate_cuped_binary().
Parameters
- n (
int) – Number of samples to generate. - seed (
int) – Random seed. - add_pre (
bool) – Whether to add a pre-period covariate 'y_pre'. - pre_name (
str) – Name of the pre-period covariate column. - pre_target_corr (
float) – Target correlation between y_pre and post-outcome y in the control group. - pre_spec (
PreCorrSpec) – Detailed specification for pre-period calibration (transform, method, etc.). - include_oracle (
bool) – Whether to include oracle columns like 'cate', 'g0', and 'g1'. - return_causal_data (
bool) – Whether to return a CausalData object. - theta_logit (
float) – Baseline log-odds uplift scale for heterogeneous treatment effects.
Returns
DataFrame or CausalData–
diagnostics
Modules
Functions
- assumption_ate_gap – Check adjusted-vs-naive ATE gap relative to naive SE.
- assumption_condition_number – Check global collinearity via condition number.
- assumption_cooks – Check Cook's distance influence diagnostics.
- assumption_design_rank – Check that the design matrix is full rank.
- assumption_hc23_stability – Check HC2/HC3 stability when leverage terms approach one.
- assumption_leverage – Check leverage concentration.
- assumption_near_duplicates – Check near-duplicate centered covariate pairs.
- assumption_residual_tails – Check residual extremes using max standardized residual only.
- assumption_vif – Check VIF from centered main-effect covariates.
- assumption_winsor_sensitivity – Check sensitivity of adjusted ATE to winsorized-outcome refit.
- cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
- design_matrix_checks – Return rank/conditioning diagnostics for a numeric design matrix.
- overall_assumption_flag – Return overall GREEN/YELLOW/RED status from an assumptions table.
- regression_assumption_rows_from_checks – Run all CUPED regression assumption tests and return row payloads.
- regression_assumptions_table_from_checks – Return a table of GREEN/YELLOW/RED assumption flags from checks payload.
- regression_assumptions_table_from_data – Fit CUPED on
CausalDataand return the assumptions flag table. - regression_assumptions_table_from_diagnostic_data – Build assumption table from
CUPEDDiagnosticDatapayload. - regression_assumptions_table_from_estimate – Build assumptions table from a CUPED estimate.
- run_regression_checks – Build a compact payload with design, residual, and influence diagnostics.
- style_regression_assumptions_table – Return pandas Styler with colored flag cells for notebook display.
FLAG_GREEN
FLAG_RED
FLAG_YELLOW
assumption_ate_gap
Check adjusted-vs-naive ATE gap relative to naive SE.
assumption_condition_number
Check global collinearity via condition number.
assumption_cooks
Check Cook's distance influence diagnostics.
assumption_design_rank
Check that the design matrix is full rank.
assumption_hc23_stability
Check HC2/HC3 stability when leverage terms approach one.
assumption_leverage
Check leverage concentration.
assumption_near_duplicates
Check near-duplicate centered covariate pairs.
assumption_residual_tails
Check residual extremes using max standardized residual only.
assumption_vif
Check VIF from centered main-effect covariates.
assumption_winsor_sensitivity
Check sensitivity of adjusted ATE to winsorized-outcome refit.
cuped_forest_plot
Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
Parameters
- estimate_with_cuped (
CausalEstimate) – Effect estimated with CUPED adjustment. - estimate_without_cuped (
CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function usesestimate_with_cuped.diagnostic_data.ate_naiveandestimate_with_cuped.diagnostic_data.se_naiveto build a normal-approx CI.
design_matrix_checks
Return rank/conditioning diagnostics for a numeric design matrix.
forest_plot
Functions
- cuped_forest_plot – Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
cuped_forest_plot
Forest plot of absolute estimates and CIs for CUPED vs non-CUPED.
Parameters
- estimate_with_cuped (
CausalEstimate) – Effect estimated with CUPED adjustment. - estimate_without_cuped (
CausalEstimate) – Effect estimated without CUPED adjustment. If omitted, the function usesestimate_with_cuped.diagnostic_data.ate_naiveandestimate_with_cuped.diagnostic_data.se_naiveto build a normal-approx CI.
overall_assumption_flag
Return overall GREEN/YELLOW/RED status from an assumptions table.
regression_assumption_rows_from_checks
Run all CUPED regression assumption tests and return row payloads.
regression_assumptions_table_from_checks
Return a table of GREEN/YELLOW/RED assumption flags from checks payload.
regression_assumptions_table_from_data
Fit CUPED on CausalData and return the assumptions flag table.
regression_assumptions_table_from_diagnostic_data
Build assumption table from CUPEDDiagnosticData payload.
regression_assumptions_table_from_estimate
Build assumptions table from a CUPED estimate.
Supports both call styles:
regression_assumptions_table_from_estimate(estimate, ...)regression_assumptions_table_from_estimate(data, estimate, ...)
regression_checks
Functions
- assumption_ate_gap – Check adjusted-vs-naive ATE gap relative to naive SE.
- assumption_condition_number – Check global collinearity via condition number.
- assumption_cooks – Check Cook's distance influence diagnostics.
- assumption_design_rank – Check that the design matrix is full rank.
- assumption_hc23_stability – Check HC2/HC3 stability when leverage terms approach one.
- assumption_leverage – Check leverage concentration.
- assumption_near_duplicates – Check near-duplicate centered covariate pairs.
- assumption_residual_tails – Check residual extremes using max standardized residual only.
- assumption_vif – Check VIF from centered main-effect covariates.
- assumption_winsor_sensitivity – Check sensitivity of adjusted ATE to winsorized-outcome refit.
- design_matrix_checks – Return rank/conditioning diagnostics for a numeric design matrix.
- leverage_and_cooks – Compute leverage, Cook's distance, and internally studentized residuals.
- near_duplicate_corr_pairs – Find pairs with absolute correlation very close to one.
- overall_assumption_flag – Return overall GREEN/YELLOW/RED status from an assumptions table.
- regression_assumption_rows_from_checks – Run all CUPED regression assumption tests and return row payloads.
- regression_assumptions_table_from_checks – Return a table of GREEN/YELLOW/RED assumption flags from checks payload.
- regression_assumptions_table_from_data – Fit CUPED on
CausalDataand return the assumptions flag table. - regression_assumptions_table_from_diagnostic_data – Build assumption table from
CUPEDDiagnosticDatapayload. - regression_assumptions_table_from_estimate – Build assumptions table from a CUPED estimate.
- run_regression_checks – Build a compact payload with design, residual, and influence diagnostics.
- style_regression_assumptions_table – Return pandas Styler with colored flag cells for notebook display.
- vif_from_corr – Approximate VIF from inverse correlation matrix of standardized covariates.
- winsor_fit_tau – Refit OLS on winsorized outcome and return treatment coefficient.
FLAG_COLOR
FLAG_GREEN
FLAG_LEVEL
FLAG_RED
FLAG_YELLOW
assumption_ate_gap
Check adjusted-vs-naive ATE gap relative to naive SE.
assumption_condition_number
Check global collinearity via condition number.
assumption_cooks
Check Cook's distance influence diagnostics.
assumption_design_rank
Check that the design matrix is full rank.
assumption_hc23_stability
Check HC2/HC3 stability when leverage terms approach one.
assumption_leverage
Check leverage concentration.
assumption_near_duplicates
Check near-duplicate centered covariate pairs.
assumption_residual_tails
Check residual extremes using max standardized residual only.
assumption_vif
Check VIF from centered main-effect covariates.
assumption_winsor_sensitivity
Check sensitivity of adjusted ATE to winsorized-outcome refit.
design_matrix_checks
Return rank/conditioning diagnostics for a numeric design matrix.
leverage_and_cooks
Compute leverage, Cook's distance, and internally studentized residuals.
near_duplicate_corr_pairs
Find pairs with absolute correlation very close to one.
overall_assumption_flag
Return overall GREEN/YELLOW/RED status from an assumptions table.
regression_assumption_rows_from_checks
Run all CUPED regression assumption tests and return row payloads.
regression_assumptions_table_from_checks
Return a table of GREEN/YELLOW/RED assumption flags from checks payload.
regression_assumptions_table_from_data
Fit CUPED on CausalData and return the assumptions flag table.
regression_assumptions_table_from_diagnostic_data
Build assumption table from CUPEDDiagnosticData payload.
regression_assumptions_table_from_estimate
Build assumptions table from a CUPED estimate.
Supports both call styles:
regression_assumptions_table_from_estimate(estimate, ...)regression_assumptions_table_from_estimate(data, estimate, ...)
run_regression_checks
Build a compact payload with design, residual, and influence diagnostics.
style_regression_assumptions_table
Return pandas Styler with colored flag cells for notebook display.
vif_from_corr
Approximate VIF from inverse correlation matrix of standardized covariates.
winsor_fit_tau
Refit OLS on winsorized outcome and return treatment coefficient.
run_regression_checks
Build a compact payload with design, residual, and influence diagnostics.
style_regression_assumptions_table
Return pandas Styler with colored flag cells for notebook display.
model
Classes
- CUPEDModel – CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
CUPEDModel
CUPED-style regression adjustment estimator for ATE/ITT in randomized experiments.
Fits an outcome regression with pre-treatment covariates (always centered over the full sample, never within treatment groups) implemented as Lin (2013) fully interacted OLS:
The reported effect is the coefficient on D, with robust covariance as requested.
This specification ensures the coefficient on D is the ATE/ITT even if the
treatment effect is heterogeneous with respect to covariates.
This is broader than canonical single-theta CUPED (Y - theta*(X - mean(X))).
Parameters
- cov_type (
str) – Covariance estimator passed to statsmodels (e.g., "nonrobust", "HC0", "HC1", "HC2", "HC3"). Note: for cluster-randomized designs, use cluster-robust SEs (not implemented here). - alpha (
float) – Significance level for confidence intervals. - strict_binary_treatment (
bool) – If True, require treatment to be binary {0,1}. - use_t (
bool | None) – If bool, passed to statsmodels.fit(..., use_t=use_t)directly. If None, automatic policy is used: for robust HC* covariances,use_t=Truewhenn < use_t_auto_n_threshold, elseFalse. For non-robust covariance,use_t=True. - use_t_auto_n_threshold (
int) – Sample-size threshold for automaticuse_tselection whenuse_t=Noneand covariance is HC* robust. - relative_ci_method (
('delta_nocov', 'bootstrap')) – Method for relative CI of100 * tau / mu_c. - "delta_nocov": delta method using robust
Var(tau)andVar(mu_c)while settingCov(tau, mu_c)=0(safe fallback without unsupported hybrid IF covariance). - "bootstrap": percentile bootstrap CI on the relative effect.
- relative_ci_bootstrap_draws (
int) – Number of bootstrap resamples used whenrelative_ci_method="bootstrap". - relative_ci_bootstrap_seed (
int | None) – RNG seed used for bootstrap relative CI. - covariate_variance_min (
float) – Minimum variance threshold for retaining a CUPED covariate. Covariates with variance less than or equal to this threshold are dropped before fitting. - condition_number_warn_threshold (
float) – Trigger diagnostics signal when the design matrix condition number exceeds this threshold. - run_regression_checks (
bool) – Whether to compute regression diagnostics payload duringfit(). - check_action (
('ignore', 'raise')) – Action used when a diagnostics threshold is violated. - raise_on_yellow (
bool) – Whencheck_action="raise", also raise on YELLOW assumption flags. - corr_near_one_tol (
float) – Correlation tolerance used to mark near-duplicate centered covariates. - vif_warn_threshold (
float) – VIF threshold that triggers a diagnostics signal. - winsor_q (
float | None) – Quantile used for winsor sensitivity refit. SetNoneto disable. - tiny_one_minus_h_tol (
float) – Threshold for flagging near-degenerate1 - leverageterms in HC2/HC3.
Notes
- Validity requires covariates be pre-treatment. Post-treatment covariates can bias estimates.
- Covariates are globally centered over the full sample only. This centering convention is required so the treatment coefficient in the Lin specification remains the ATE/ITT.
- The Lin (2013) specification is recommended as a robust regression-adjustment default in RCTs.
Functions
- assumptions_table – Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
- estimate – Return the adjusted ATE/ITT estimate and inference.
- fit – Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
- summary_dict – Convenience JSON/logging output.
adjustment
alpha
assumptions_table
Return fitted regression assumptions table (GREEN/YELLOW/RED) when available.
center_covariates
centering_scope
check_action
condition_number_warn_threshold
corr_near_one_tol
cov_type
covariate_variance_min
estimate
Return the adjusted ATE/ITT estimate and inference.
Parameters
- alpha (
float) – Override the instance significance level for confidence intervals. - diagnostic_data (
bool) – Whether to include diagnostic data_contracts in the result.
Returns
CausalEstimate– A results object containing effect estimates and inference.
fit
Fit CUPED-style regression adjustment (Lin-interacted OLS) on a CausalData object.
Parameters
- data (
CausalData) – Validated dataset with columns: outcome (post), treatment, and confounders (pre covariates). - covariates (
(Sequence[str], required)) – Explicit subset ofdata_contracts.confounders_namesto use as CUPED covariates. Pass[]for an unadjusted (naive) fit. - run_checks (
bool | None) – Override whether regression checks are computed in this fit call. IfNone, usesself.run_regression_checks.
Returns
CUPEDModel– Fitted estimator.
Raises
ValueError– Ifcovariatesis omitted, not a sequence of strings, contains columns missing from the DataFrame, contains columns outsidedata_contracts.confounders_names, treatment is not binary whenstrict_binary_treatment=True, or the design matrix is rank deficient.
raise_on_yellow
relative_ci_bootstrap_draws
relative_ci_bootstrap_seed
relative_ci_method
run_regression_checks
strict_binary_treatment
summary_dict
Convenience JSON/logging output.
Parameters
- alpha (
float) – Override the instance significance level for confidence intervals.
Returns
dict– Dictionary with estimates, inference, and diagnostics.
tiny_one_minus_h_tol
use_t
use_t_auto_n_threshold
vif_warn_threshold
winsor_q
regression_assumptions_table_from_data
Fit CUPED on CausalData and return the assumptions flag table.
regression_assumptions_table_from_estimate
Build assumptions table from a CUPED estimate.
Supports both call styles:
regression_assumptions_table_from_estimate(estimate, ...)regression_assumptions_table_from_estimate(data, estimate, ...)
style_regression_assumptions_table
Return pandas Styler with colored flag cells for notebook display.