Skip to content
Research5 min read

generate_scm_poisson_26()

This notebook presents the generate scm poisson 26 research workflow and key analysis steps.

generate_scm_poisson_26()

This notebook presents the generate scm poisson 26 research workflow and key analysis steps.

This scenario produces synthetic panel data with one treated unit and multiple donors. It uses a low-level Poisson DGP to simulate realistic discrete outcomes with time-varying exposure and latent rates.

DGP math

The Data Generating Process (DGP) for the Poisson SCM scenario follows a hierarchical log-linear model for the mean μ\mu, with observations yy sampled from a Poisson distribution.

1.1 Donor units

For each donor unit jj at time tt:

  1. Exposure EtjE_{tj}: log(Etj)=αj+γj(ttˉ)+ϵtcommon_exp+ϵtjdonor_exp\log(E_{tj}) = \alpha_j + \gamma_j (t - \bar{t}) + \epsilon_t^{\text{common\_exp}} + \epsilon_{tj}^{\text{donor\_exp}} where ϵtcommon_exp\epsilon_t^{\text{common\_exp}} and ϵtjdonor_exp\epsilon_{tj}^{\text{donor\_exp}} are AR(1) processes.
  2. Mean μtj\mu_{tj}: μtj=Etjexp(ηtj)\mu_{tj} = E_{tj} \cdot \exp(\eta_{tj}) ηtj=βj+δj(ttˉ)+λjSt+Lt+kϕjkFtk+ϵtjdonor_noise\eta_{tj} = \beta_j + \delta_j (t - \bar{t}) + \lambda_j S_t + L_t + \sum_{k} \phi_{jk} F_{tk} + \epsilon_{tj}^{\text{donor\_noise}} where:
    • StS_t: monthly seasonality signal
    • LtL_t: common factor (macro log index, AR(1))
    • FtkF_{tk}: latent factors (AR(1))
    • ϵtjdonor_noise\epsilon_{tj}^{\text{donor\_noise}}: donor-specific AR(1) noise
  3. Outcome ytjy_{tj}: ytjPoisson(μtj)y_{tj} \sim \text{Poisson}(\mu_{tj})

1.2 Treated unit

The counterfactual mean μt,cf\mu_{t, cf} is a weighted combination of donors, potentially with a pre-fit mismatch: μt,cf=(jwjμtj)exp(ϵtmismatch)\mu_{t, cf} = \left( \sum_j w_j \mu_{tj} \right) \cdot \exp(\epsilon_t^{\text{mismatch}}) where wDirichlet(α)w \sim \text{Dirichlet}(\alpha).

The treated mean μt,treated\mu_{t, treated} is: μt,treated=μt,cf(1+τtrate)\mu_{t, treated} = \mu_{t, cf} \cdot (1 + \tau_{t}^{\text{rate}}) where τtrate\tau_{t}^{\text{rate}} follows a post-treatment ramp-in path: τtkrate=(treatment_effect_rate+treatment_effect_slopek)(1e(k+1)/2.5),k=0,,Tpost1,\tau_{t_k}^{\text{rate}} = \left(\texttt{treatment\_effect\_rate} + \texttt{treatment\_effect\_slope}\cdot k\right)\left(1 - e^{-(k+1)/2.5}\right), \quad k=0,\dots,T_{post}-1, and τtrate=0\tau_{t}^{\text{rate}} = 0 for all pre-treatment and intervention-anchor periods.

The outcomes yt,cfy_{t, cf} and yt,treatedy_{t, treated} are coupled via a thinning/superposition property to maintain exact Poisson marginals while ensuring the realized effect is driven by the multiplier in expectation:

  • yt,cfPoisson(μt,cf)y_{t, cf} \sim \text{Poisson}(\mu_{t, cf})
  • If μt,treatedμt,cf\mu_{t, treated} \ge \mu_{t, cf}: yt,treated=yt,cf+Δt,ΔtPoisson(μt,treatedμt,cf)y_{t, treated} = y_{t, cf} + \Delta_t, \quad \Delta_t \sim \text{Poisson}(\mu_{t, treated} - \mu_{t, cf})
  • If μt,treated<μt,cf\mu_{t, treated} < \mu_{t, cf}: yt,treatedyt,cfBinomial(yt,cf,μt,treatedμt,cf)y_{t, treated}\mid y_{t, cf} \sim \text{Binomial}\left(y_{t, cf}, \frac{\mu_{t, treated}}{\mu_{t, cf}}\right)

This ensures that E[yt,treatedyt,cfμ]=μt,treatedμt,cfE[y_{t, treated} - y_{t, cf} | \mu] = \mu_{t, treated} - \mu_{t, cf} and both marginals remain Poisson with means μt,cf\mu_{t, cf} and μt,treated\mu_{t, treated}.

2. Oracle Treatment Effects (ATT)

The Average Treatment Effect on the Treated (ATT) is the average impact of the intervention across all post-treatment periods. In this synthetic scenario, we can calculate it in two ways:

  1. Realized ATT: Based on observed vs. counterfactual outcomes. ATTrealized=1TposttPost(YtYt(0))\text{ATT}_{realized} = \frac{1}{T_{post}} \sum_{t \in \text{Post}} (Y_t - Y_t^{(0)}) In the data, this is the mean of tau_realized_true for the treated unit in post-periods.

  2. Mean ATT: Based on the underlying population means (the "signal"). ATTmean=1TposttPost(μt(1)μt(0))\text{ATT}_{mean} = \frac{1}{T_{post}} \sum_{t \in \text{Post}} (\mu_t^{(1)} - \mu_t^{(0)}) In the data, this is the mean of tau_mean_true for the treated unit in post-periods.

Result

Ground-truth ATTE is 2.750000

Result
unit_idcalendar_timetreated_timeyy_cftau_realized_truemu_cfmu_treatedtau_mean_true
0donor_12000-0104.04.00.06.1506716.1506710.0
1donor_12000-0209.09.00.05.5994035.5994030.0
2donor_12000-0304.04.00.04.9781194.9781190.0
3donor_12000-0406.06.00.05.7772475.7772470.0
4donor_12000-0508.08.00.05.6857115.6857110.0

EDA

Result

PanelDataSCM(df=(3885, 4), y='y', unit_col='unit_id', time_col='calendar_time', treated_time='treated_time', time_freq='M', treated_unit='treated', treatment_start=Period('2015-02', 'M'), last_post_period=Period('2015-05', 'M'), n_pre_periods=181, n_post_periods=4, donor_units=['donor_1', 'donor_10', 'donor_11', 'donor_12', 'donor_13', 'donor_14', 'donor_15', 'donor_16', 'donor_17', 'donor_18', 'donor_19', 'donor_2', 'donor_20', 'donor_3', 'donor_4', 'donor_5', 'donor_6', 'donor_7', 'donor_8', 'donor_9'])

Result
donorpre_meanpre_stdpre_slopecorr_with_treated_prermse_to_treated_prermse_to_treated_pre_standardizedmean_diff_preslope_diff_premax_abs_gap_preis_never_treatedn_missing_precorr_rankstd_rmse_rankslope_rankcomposite_similarity_scorerank_by_similaritynotes
0donor_49.7624317.1791540.1155530.7750966.6706250.842433-4.287293-0.00792920.0True04510.8833331ok
1donor_1013.5635367.6028920.1146740.7550045.4620740.689805-0.486188-0.00880717.0True08120.8666672ok
2donor_1816.2209949.2983570.1409310.7745486.3101250.7969062.1712710.01744920.0True05330.8666673ok
3donor_913.7016576.7332310.1001520.7314605.4918320.693564-0.348066-0.02333019.0True010240.7833334ok
4donor_110.6464095.4900060.0797640.7402066.3297940.799390-3.403315-0.04371720.0True09460.7333335ok
5donor_1717.45856410.8868970.1787790.7706677.7441830.9780133.4088400.05529727.0True06990.6500006ok
6donor_1910.4530395.1533670.0737880.6913116.7656100.854429-3.596685-0.04969323.0True012670.6333337ok
7donor_138.7182325.7941820.0809500.6810397.8869780.996047-5.331492-0.04253130.0True0131050.5833338ok
8donor_89.1988955.1522880.0701840.6699637.6230240.962712-4.850829-0.05329823.0True014880.5500009ok
9donor_510.9502765.2677230.0653470.6096167.0141920.885823-3.099448-0.05813422.0True0167100.50000010ok
10donor_77.2044204.6104830.0648720.6961738.9433451.129455-6.845304-0.05861022.0True01112110.48333311high_std_rmse
11donor_1121.03867412.9445760.2245280.82882810.4498281.3197096.9889500.10104635.0True0216170.46666712high_std_rmse
12donor_1415.68508315.0734940.2500560.8324989.6879491.2234911.6353590.12657434.0True0115190.46666713high_std_rmse
13donor_68.6906084.1496020.0496490.5640908.4598501.068395-5.359116-0.07383327.0True01711130.36666714high_std_rmse
14donor_127.5690613.7640250.0464410.6185559.0550801.143566-6.480663-0.07704031.0True01513140.35000015high_std_rmse
15donor_1527.75138113.6612150.2128210.76474516.4806942.08134713.7016570.08933953.0True0719160.35000016high_std_rmse
16donor_231.86187829.3673200.4828320.82550829.2999353.70029017.8121550.359351135.0True0320200.33333317high_std_rmse
17donor_2019.9723766.5083300.0562040.4809559.5164881.2018375.922652-0.06727823.0True01914120.30000018high_std_rmse
18donor_165.6629833.6523220.0403900.56108710.6734551.347951-8.386740-0.08309131.0True01817150.21666719high_std_rmse
19donor_319.9005525.562408-0.0001380.10585610.8879271.3750375.850829-0.12361933.0True02018180.11666720high_std_rmse
Result

png