Skip to content
Research7 min read

ASCM Model Diagnostics

This notebook presents the scm diagnostics research workflow and key analysis steps.

ASCM Model Diagnostics

This notebook presents the scm diagnostics research workflow and key analysis steps.

1. Setup

First, we generate synthetic data and fit an Augmented SCM model, following the baseline example.

2. Mathematical Foundations: The pre_scale Metric

Before running specific tests, run_scm_diagnostics calculates a baseline variability measure called pre_scale for the treated unit during the pre-treatment period.

Most diagnostic thresholds are defined as a percentage of this scale. This ensures that the "goodness of fit" is judged relative to the inherent volatility of the data.

Robust Scale Estimators

To avoid being misled by outliers or trends, the function calculates four different scale metrics and takes the maximum:

  1. Standard Deviation: σ=1n1t=1n(xtxˉ)2\sigma = \sqrt{\frac{1}{n-1} \sum_{t=1}^{n} (x_t - \bar{x})^2}

  2. Median Absolute Deviation (MAD): Scaled to be consistent with σ\sigma for a normal distribution: MADscaled=1.4826median(xtmedian(x))MAD_{\text{scaled}} = 1.4826 \cdot \text{median}(|x_t - \text{median}(x)|)

  3. Interquartile Range (IQR): Also scaled for normal consistency: IQRscaled=Q75Q251.349IQR_{\text{scaled}} = \frac{Q_{75} - Q_{25}}{1.349}

  4. First-Difference Scale: Calculated using the standard deviation of first differences, useful when there is a trend or high autocorrelation. Since Var(XtXt1)=2σ2\text{Var}(X_t - X_{t-1}) = 2\sigma^2 for i.i.d. variables, we scale by 2\sqrt{2}: diff_scale=std(Δx)2\text{diff\_scale} = \frac{\text{std}(\Delta x)}{\sqrt{2}}

pre_scale=max(σ,MADscaled,IQRscaled,diff_scale)\text{pre\_scale} = \max(\sigma, MAD_{\text{scaled}}, IQR_{\text{scaled}}, \text{diff\_scale})

3. Diagnostic Tests

The diagnostics are divided into fit-based checks and model-based (weight) checks.

A. Fit-Based Diagnostics

  1. Pre-treatment RMSE: Checks if the overall fit in the pre-treatment period is tight. RMSEpre=1Tpret=1Tpre(yty^t)20.20pre_scaleRMSE_{pre} = \sqrt{\frac{1}{T_{pre}} \sum_{t=1}^{T_{pre}} (y_t - \hat{y}_t)^2} \le 0.20 \cdot \text{pre\_scale}

  2. Max Absolute Pre-gap: Checks for extreme outliers in the pre-treatment period. maxtpreyty^t0.50pre_scale\max_{t \in \text{pre}} |y_t - \hat{y}_t| \le 0.50 \cdot \text{pre\_scale}

  3. Mean Gap Last K Pre-periods: Checks for pre-trend drift right before the treatment (typically k=3k=3). If the average gap is far from zero, the parallel trends assumption might be failing just as treatment begins. 1kt=Tprek+1Tpre(yty^t)0.25pre_scale|\frac{1}{k} \sum_{t=T_{pre}-k+1}^{T_{pre}} (y_t - \hat{y}_t)| \le 0.25 \cdot \text{pre\_scale}

B. Weight-Based Diagnostics

Augmented SCM (ASCM) allows for weights that can be negative or greater than one to achieve a better pre-treatment fit (extrapolation). However, extreme weights indicate that the counterfactual relies on unstable linear combinations of donors.

  1. Max Absolute Weight: Detects extreme individual weights. maxwj2.0 (default)\max |w_j| \le 2.0 \text{ (default)}

  2. L1 Norm of Weights: Detects when the total sum of absolute weights is too high. wj5.0 (default)\sum |w_j| \le 5.0 \text{ (default)}

  3. Negative Weight Share: Measures how much the estimate relies on negative weights (pure extrapolation vs. interpolation). wj<0wjwj0.30\frac{\sum_{w_j < 0} |w_j|}{\sum |w_j|} \le 0.30

4. Running the Diagnostics

We can now execute run_scm_diagnostics and inspect the results table.

Result
testflagvaluethresholdmessage
0pre_rmse_augmentedGREEN0.366248<= 0.5254Pre-treatment RMSE is within tolerance.
1max_abs_pre_gap_augmentedGREEN0.988290<= 1.314Largest pre-treatment gap is within tolerance.
2mean_gap_last_k_pre_augmentedGREEN0.207285<= 0.6568Average gap in the last 3 pre periods is cente...
3max_abs_weight_augmentedGREEN0.201943<= 2No extreme augmented donor weight detected (mo...
4l1_norm_weights_augmentedGREEN2.245170<= 5Total absolute augmented weight mass is contro...
5negative_weight_shareGREEN0.277300<= 0.3Negative-weight share is moderate.
6slsqp_fallback_countGREEN0.000000== 0No optimizer fallback events recorded.
7suppressed_fit_warning_countGREEN0.000000== 0No suppressed fit warnings were captured.

5. Interpreting Flags

  • GREEN: Metric is within optimal tolerance. The estimate is likely robust.
  • YELLOW: Metric exceeds the warning threshold. Consider investigating the pre-treatment fit or donor pool. The estimate might be sensitive to small changes.
  • RED: Indicates a severe problem, such as optimizer failure or extreme fit issues. The estimate should be treated with high skepticism.

6. Placebo Tests

Robustness tests check if the treatment effect is unique to the treated unit and time.

A. Placebo-in-Space (Permutation Test)

This test (Abadie et al. 2010) applies the SCM to every unit in the donor pool, treating each donor as the "treated" unit.

  1. For each donor unit jDonorsj \in \text{Donors}, we fit a synthetic control model using only the remaining donors.
  2. We calculate the Root Mean Squared Prediction Error (RMSE) for both pre- and post-treatment periods: RMSEpre,j=1Tpret=1Tpre(yj,ty^j,t)2RMSE_{pre,j} = \sqrt{\frac{1}{T_{pre}} \sum_{t=1}^{T_{pre}} (y_{j,t} - \hat{y}_{j,t})^2} RMSEpost,j=1Tpostt=Tpre+1T(yj,ty^j,t)2RMSE_{post,j} = \sqrt{\frac{1}{T_{post}} \sum_{t=T_{pre}+1}^{T} (y_{j,t} - \hat{y}_{j,t})^2}
  3. We compute the RMSPE Ratio: Rj=RMSEpost,jRMSEpre,jR_j = \frac{RMSE_{post,j}}{RMSE_{pre,j}}
  4. Interpretation: If the actual treated unit's ratio RtreatedR_{treated} is much larger than the distribution of RjR_j for placebo units, it indicates that the effect is statistically significant. The p-value is effectively: p=count(RjRtreated)Ndonors+1p = \frac{\text{count}(R_j \ge R_{treated})}{N_{donors} + 1}

B. Placebo-in-Time (Falsification Test)

This test shifts the treatment start date TactualT_{actual} to an earlier date TplaceboT_{placebo} within the true pre-treatment period.

  1. We choose Tplacebo<TactualT_{placebo} < T_{actual} and a pseudo-post-horizon HH.
  2. We fit the SCM on [1,Tplacebo][1, T_{placebo}] and measure the "effect" on [Tplacebo+1,Tplacebo+H][T_{placebo}+1, T_{placebo}+H].
  3. Since no treatment occurred during this period, the estimated Average Treatment Effect on the Treated (ATT) should be close to zero and statistically insignificant.
  4. Interpretation: If the null hypothesis (ATT=0) is rejected for many placebo dates, it suggests that the parallel trends assumption is violated (e.g., due to pre-treatment anticipation effects or unobserved trends).
Result
placebo_treatment_startn_pre_before_placebon_post_after_placeboaverage_att_placeboci_lowerci_upperp_valuerejects_zeropre_fit_metric
02000-03261.882234-16.33175620.0962230.414358False3.892700e-09
12000-04361.269647-2.9228105.4621050.322397False4.675707e-08
22000-05461.558211-2.7821125.8985340.262430False7.740750e-09
32000-06561.314722-2.9452655.5747090.315493False8.664180e-09
42000-07661.415188-1.5846804.4150560.179511False8.107963e-09

7. Sensitivity Analysis

Sensitivity tests assess how dependent the final estimate is on specific assumptions or data subsets.

Leave-One-Donor-Out (LODO) Sensitivity

A common concern in SCM is that the result might be driven by a single, idiosyncratic donor unit. This test (Abadie et al. 2015) re-estimates the model NdonorsN_{donors} times, each time removing one unit from the donor pool.

  1. Let JJ be the set of all available donor units.
  2. For each unit jJj \in J, we fit a new model using J{j}J \setminus \{j\}.
  3. We calculate the re-estimated treatment effect ATTjATT_j and the change relative to the full model: Δj=ATTjATTfull\Delta_j = ATT_j - ATT_{full}
  4. Interpretation:
    • Small Δj\Delta_j for all jj indicates a robust estimate.
    • A large Δj\Delta_j suggests that donor jj is highly influential. If that donor's pre-fit was poor or if it has unusual characteristics, the main estimate should be treated with caution.

Effective Number of Donors

To understand how concentrated the weights are, we calculate the Effective Number of Donors (NeffN_{eff}), which is the inverse of the Herfindahl-Hirschman Index (HHI) of the weights:

Neff=1jJwj2N_{eff} = \frac{1}{\sum_{j \in J} w_j^2}

  • Neff=1N_{eff} = 1: The counterfactual is built using only one donor.
  • Neff=NdonorsN_{eff} = N_{donors}: All donors have equal weights (1/Ndonors1/N_{donors}).

Lower NeffN_{eff} indicates higher reliance on a small subset of donors.

Interpreting the Results Table

  • dropped_donor: Donor unit removed in that refit.
  • average_att_reestimated: ATT calculated after removing that donor.
  • delta_vs_full_model: Δj=ATTjATTfull_model\Delta_j = ATT_j - ATT_{full\_model}. Values near zero indicate the estimate is stable; large values suggest the donor is highly influential.
  • pre_rmse_reestimated: Pre-treatment fit error after refit. A significant increase suggests the dropped donor was essential for maintaining fit quality.
  • max_weight_after_refit: Largest donor weight in the refitted model.
  • effective_n_donors_after_refit: The effective number of donors (NeffN_{eff}) for the refitted model.
Result
dropped_donoraverage_att_reestimateddelta_vs_full_modelpre_rmse_reestimatedmax_weight_after_refiteffective_n_donors_after_refit
0donor_14.2078580.2622230.1714050.3030300.946472
1donor_104.5877950.6421610.1608020.3491880.909245
2donor_114.1891360.2435020.1601200.3532470.909496
3donor_124.9162550.9706210.1891670.3364270.978294
4donor_134.4681270.5224920.1625250.3417350.927949

8. Diagnostic Visualizations

Visualization is a critical step in assessing SCM results. It allows for a qualitative check of the pre-treatment fit and the post-treatment divergence.

Observed vs. Synthetic Plot

This plot displays the time series of the actual treated unit alongside its synthetic counterfactual (both standard and augmented, if available).

  • Pre-treatment period: The closer the observed and synthetic lines are, the better the model's fit. A poor pre-fit (large gaps) suggests that the donor pool cannot adequately represent the treated unit.
  • Post-treatment period: Divergence between the lines indicates a treatment effect.

Gap Over Time Plot

The gap plot shows the difference between the observed and synthetic outcomes: Gapt=y1,ty^1,tGap_t = y_{1,t} - \hat{y}_{1,t}

  • In the pre-treatment period, the gap should ideally fluctuate closely around zero.
  • In the post-treatment period, the gap represents the estimated treatment effect (ATTtATT_t) at each point in time.

Interpreting the Gap:

  • If the gap is systematically positive (or negative) after treatment, it suggests a persistent effect.
  • If the gap fluctuates around zero even after treatment, it suggests no significant effect.
Result

png