Research4 min read

generate_multitreatment_gamma_26()

Automated conversion of generate_multitreatment_gamma_26.ipynb

generate_multitreatment_gamma_26()

generate_multitreatment_gamma_26() builds a 3-arm multi-treatment observational DGP with correlated confounders and a Gamma outcome.

Treatment columns are one-hot: D=(D0,D1,D2)D=(D_0,D_1,D_2) with exactly one active class per row (d_0, d_1, d_2).

The scenario is configured to target marginal treatment shares close to 0.50,0.25,0.250.50, 0.25, 0.25.

1. Confounders and Dependence

The confounder vector is X=(X1,,X8)X=(X_1,\dots,X_8) with marginals:

  • X1X_1 (tenure_months) N(24,122)\sim \mathcal{N}(24, 12^2), clipped to [0,120][0,120]
  • X2X_2 (avg_sessions_week) N(5,22)\sim \mathcal{N}(5, 2^2), clipped to [0,40][0,40]
  • X3X_3 (spend_last_month) LogNormal(log60,0.9)\sim \text{LogNormal}(\log 60, 0.9), clipped at 500
  • X4X_4 (premium_user) Bernoulli(0.25)\sim \text{Bernoulli}(0.25)
  • X5X_5 (urban_resident) Bernoulli(0.60)\sim \text{Bernoulli}(0.60)
  • X6X_6 (support_tickets_q) Poisson(1.5)\sim \text{Poisson}(1.5), clipped at 15
  • X7X_7 (discount_eligible) Bernoulli(0.35)\sim \text{Bernoulli}(0.35)
  • X8X_8 (credit_utilization) Beta(α,β)\sim \text{Beta}(\alpha,\beta) with mean 0.450.45 and concentration κ=20\kappa=20

Dependence is induced with a Gaussian copula whose correlation matrix is Toeplitz: Σij=ρij,ρ=0.30.\Sigma_{ij} = \rho^{|i-j|}, \quad \rho=0.30.

2. Treatment Assignment (Multinomial Logit)

For each class k{0,1,2}k \in \{0,1,2\}, define score sk(X)=αk+Xβd,k.s_k(X) = \alpha_k + X^\top\beta_{d,k}. Then propensity is softmax: mk(X)=P(Dk=1X)=exp(sk(X))j=02exp(sj(X)).m_k(X)=P(D_k=1\mid X)=\frac{\exp(s_k(X))}{\sum_{j=0}^2\exp(s_j(X))}.

Coefficients used in this scenario:

  • βd,0=0\beta_{d,0}=\mathbf{0}
  • βd,1=[0.01,0.10,0.0015,0.50,0.20,0.05,0.35,0.40]\beta_{d,1}=[0.01, 0.10, 0.0015, 0.50, 0.20, 0.05, 0.35, 0.40]
  • βd,2=[0.005,0.07,0.0010,0.35,0.10,0.08,0.20,0.25]\beta_{d,2}=[-0.005, 0.07, 0.0010, 0.35, 0.10, 0.08, 0.20, 0.25]

Treatment-score intercepts start at 0.0,0.0,0.00.0,0.0,0.0, then are calibrated iteratively so mean class rates are close to target 0.50,0.25,0.250.50,0.25,0.25.

The structural link shift for class kk is τklink(X)=θk+τk(X),θ=(0.0,0.05,0.10).\tau^{link}_k(X)=\theta_k+\tau_k(X), \qquad \theta=(0.0,-0.05,0.10).

Control has no heterogeneous residual: τ0(X)=0\tau_0(X)=0.

For d_1 (forced harmful vs control): τ1(X)=min(0.220.001tenure0.006sessions0.05premium0.04discount0.10(credit0.45),0.02).\tau_1(X)=\min\Big(-0.22 -0.001\,\text{tenure}-0.006\,\text{sessions}-0.05\,\text{premium}-0.04\,\text{discount}-0.10(\text{credit}-0.45),\,-0.02\Big).

For d_2 (forced beneficial vs control): τ2(X)=max(0.16+0.014sessions+0.030log(1+spend)+0.06urban0.006tickets+0.12(credit0.45),0.02).\tau_2(X)=\max\Big(0.16 +0.014\,\text{sessions}+0.030\log(1+\text{spend})+0.06\,\text{urban}-0.006\,\text{tickets}+0.12(\text{credit}-0.45),\,0.02\Big).

4. Outcome Model (Gamma)

Baseline linear predictor: η0(X)=αy+Xβy,αy=0,\eta_0(X)=\alpha_y + X^\top\beta_y, \qquad \alpha_y=0, with βy=[0.01,0.08,0.0015,0.35,0.12,0.06,0.20,0.50].\beta_y=[0.01,0.08,0.0015,0.35,0.12,0.06,0.20,0.50].

Observed link for assigned treatment: η(X,D)=η0(X)+k=02Dkτklink(X).\eta(X,D)=\eta_0(X)+\sum_{k=0}^2 D_k\,\tau^{link}_k(X).

Gamma mean uses log link: μ(X,D)=exp(η(X,D)).\mu(X,D)=\exp(\eta(X,D)). Given shape a=2.0a=2.0, outcome is sampled as YX,DGamma(shape=a,scale=μ/a),Y\mid X,D \sim \text{Gamma}(\text{shape}=a,\,\text{scale}=\mu/a), so E[YX,D]=μ\mathbb{E}[Y\mid X,D]=\mu and Var(YX,D)=μ2/a\mathrm{Var}(Y\mid X,D)=\mu^2/a.

5. Oracle Quantities

When include_oracle=True, the generator exposes:

  • m_d_0, m_d_1, m_d_2: calibrated propensities mk(X)m_k(X)
  • tau_link_d_k: link-scale treatment shifts τklink(X)\tau^{link}_k(X)
  • g_d_k: potential outcome means on natural scale under class kk
  • cate_d_1 = g_{d_1}-g_{d_0}, cate_d_2 = g_{d_2}-g_{d_0}
Result
yd_0d_1d_2tenure_monthsavg_sessions_weekspend_last_monthpremium_userurban_residentsupport_tickets_q...m_obs_d_1tau_link_d_1m_d_2m_obs_d_2tau_link_d_2g_d_0g_d_1g_d_2cate_d_1cate_d_2
00.4227691.00.00.027.6566053.19866789.6094640.01.00.0...0.246687-0.3520050.2207810.2207810.4941663.2793842.3063145.375338-0.9730702.095954
17.5662311.00.00.023.7983863.362415102.3372360.00.03.0...0.179393-0.3073600.2369580.2369580.4202782.8078502.0648534.274630-0.7429971.466780
21.7026620.00.01.028.4250093.391819102.6607120.01.01.0...0.210566-0.3201890.2182450.2182450.5024153.0699192.2287985.073677-0.8411212.003758
31.8275301.00.00.018.8600664.07117583.5934170.00.02.0...0.176729-0.3162410.2376390.2376390.4416772.7168051.9802344.225485-0.7365711.508680
41.4298430.01.00.017.8530873.14007579.2098700.01.01.0...0.232492-0.3501300.2470270.2470270.4936243.2243542.2718695.282273-0.9524852.057919

5 rows × 26 columns

Result

Ground truth ATE for d_1 vs d_0 is -1.1950325692907122 Ground truth ATE for d_2 vs d_0 is 2.530398527003894

Result

MultiCausalData(df=(100000, 12), treatment_names=['d_0', 'd_1', 'd_2'], control_treatment='d_0')outcome='y', confounders=['tenure_months', 'avg_sessions_week', 'spend_last_month', 'premium_user', 'urban_resident', 'support_tickets_q', 'discount_eligible', 'credit_utilization'], user_id=None,

Result
treatmentcountmeanstdminp10p25medianp75p90max
0d_0501153.7584173.1067250.0154270.8879061.6263262.9378634.9574157.57778550.239323
1d_2250086.5417175.5397080.0431251.5126102.7756375.1026118.58491313.34876179.125235
2d_1248772.9808172.4127630.0090220.7119971.3067742.3522343.9464635.98507025.169272
Result

png

Result

png

Result
treatmentnoutlier_countoutlier_ratelower_boundupper_boundhas_outliersmethodtail
0d_05011522880.045655-3.3703089.954048Trueiqrboth
1d_22500811730.046905-5.93827717.298826Trueiqrboth
2d_12487710670.042891-2.6527607.905997Trueiqrboth
Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0premium_user0.2174400.2740720.0566320.1318230.00000
1avg_sessions_week4.8279575.0594940.2315360.1167470.00000
2spend_last_month82.89471989.3340216.4393020.0762050.00000
3support_tickets_q1.4781401.5693780.0912380.0738830.00000
4discount_eligible0.3258700.3560060.0301360.0636050.00000
5urban_resident0.5859120.6046870.0187740.0382560.00002
6tenure_months23.67246223.3913370.281125-0.0241310.00373
7credit_utilization0.4496270.4518550.0022280.0204930.02836
Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0avg_sessions_week4.8279575.3300500.5020920.2531870.00000
1premium_user0.2174400.2968610.0794210.1824660.00000
2tenure_months23.67246225.7520632.0796010.1767030.00000
3spend_last_month82.89471996.06289813.1681800.1497090.00000
4discount_eligible0.3258700.3956260.0697560.1456420.00000
5urban_resident0.5859120.6384210.0525090.1079190.00000
6support_tickets_q1.4781401.4923020.0141620.0115580.47358
7credit_utilization0.4496270.4489960.000632-0.0058110.86692