Research3 min read

generate_multitreatment_binary_26()

Automated conversion of generate_multitreatment_binary_26.ipynb

generate_multitreatment_binary_26()

generate_multitreatment_binary_26() defines a 3-arm multi-treatment observational DGP with correlated confounders and a binary outcome.

Treatments are one-hot columns (d_0, d_1, d_2) and are sampled from a multinomial-logit propensity model calibrated toward target class shares 0.50,0.25,0.250.50,0.25,0.25.

1. Confounders and Copula Correlation

The confounder vector X=(X1,,X8)X=(X_1,\dots,X_8) uses:

  • X1X_1 (tenure_months) N(24,122)\sim \mathcal{N}(24,12^2), clipped to [0,120][0,120]
  • X2X_2 (weekly_active_days) N(4.0,1.52)\sim \mathcal{N}(4.0,1.5^2), clipped to [0,7][0,7]
  • X3X_3 (annual_income_k) Gamma(shape=4,scale=18)\sim \text{Gamma}(\text{shape}=4,\text{scale}=18), clipped at 300
  • X4X_4 (premium_user) Bernoulli(0.22)\sim \text{Bernoulli}(0.22)
  • X5X_5 (family_plan) Bernoulli(0.38)\sim \text{Bernoulli}(0.38)
  • X6X_6 (recent_complaints) Poisson(0.8)\sim \text{Poisson}(0.8), clipped at 10
  • X7X_7 (discount_eligible) Bernoulli(0.30)\sim \text{Bernoulli}(0.30)
  • X8X_8 (engagement_score) Beta(α,β)\sim \text{Beta}(\alpha,\beta) with mean 0.600.60 and concentration κ=16\kappa=16

Dependencies are induced with Gaussian copula correlation Σij=0.30ij.\Sigma_{ij}=0.30^{|i-j|}.

2. Treatment Assignment (Softmax)

Class scores are sk(X)=αk+Xβd,k,k{0,1,2},s_k(X)=\alpha_k + X^\top\beta_{d,k}, \quad k\in\{0,1,2\}, with propensities mk(X)=esk(X)j=02esj(X).m_k(X)=\frac{e^{s_k(X)}}{\sum_{j=0}^{2}e^{s_j(X)}}.

Scenario coefficients:

  • βd,0=0\beta_{d,0}=\mathbf{0}
  • βd,1=[0.01,0.09,0.0018,0.45,0.20,0.08,0.30,0.28]\beta_{d,1}=[0.01,0.09,0.0018,0.45,0.20,0.08,0.30,0.28]
  • βd,2=[0.004,0.07,0.0012,0.30,0.12,0.10,0.18,0.22]\beta_{d,2}=[-0.004,0.07,0.0012,0.30,0.12,0.10,0.18,0.22]

Treatment-score intercepts start at 0.0,0.0,0.00.0,0.0,0.0 and are calibrated to match the target marginal rates.

3. Heterogeneous Effects on Logit Scale

Treatment shifts are additive on the link scale: τklink(X)=θk+τk(X),θ=(0.0,0.18,0.26).\tau^{link}_k(X)=\theta_k + \tau_k(X), \qquad \theta=(0.0,-0.18,0.26).

For d_1 (enforced harmful relative to control): τ1(X)=min(0.160.0008tenure0.020activeDays0.08premium0.03complaints0.10(engagement0.60),0.02).\tau_1(X)=\min\Big(-0.16 -0.0008\,\text{tenure}-0.020\,\text{activeDays}-0.08\,\text{premium}-0.03\,\text{complaints}-0.10(\text{engagement}-0.60),\,-0.02\Big).

For d_2 (enforced beneficial relative to control): τ2(X)=max(0.14+0.020activeDays+0.028log(1+income)+0.05familyPlan0.010complaints+0.12(engagement0.60),0.02).\tau_2(X)=\max\Big(0.14 +0.020\,\text{activeDays}+0.028\log(1+\text{income})+0.05\,\text{familyPlan}-0.010\,\text{complaints}+0.12(\text{engagement}-0.60),\,0.02\Big).

4. Outcome Model (Binary Logistic)

Baseline logit: η0(X)=αy+Xβy,αy=1.1,\eta_0(X)=\alpha_y + X^\top\beta_y, \qquad \alpha_y=-1.1, where βy=[0.003,0.11,0.004,0.40,0.25,0.12,0.20,0.90].\beta_y=[0.003,0.11,0.004,0.40,-0.25,-0.12,0.20,0.90].

Observed logit under assigned treatment: η(X,D)=η0(X)+k=02Dkτklink(X).\eta(X,D)=\eta_0(X)+\sum_{k=0}^{2}D_k\,\tau^{link}_k(X).

Then P(Y=1X,D)=σ(η)=11+eη,YBernoulli(σ(η)).P(Y=1\mid X,D)=\sigma(\eta)=\frac{1}{1+e^{-\eta}}, \qquad Y\sim\text{Bernoulli}(\sigma(\eta)).

5. Oracle Outputs

With include_oracle=True, the generated frame includes:

  • m_d_k: calibrated propensities
  • tau_link_d_k: link-scale treatment shifts
  • g_d_k: potential outcome probabilities under each arm
  • cate_d_1, cate_d_2: contrasts vs control on probability scale
Result
yd_0d_1d_2tenure_monthsweekly_active_daysannual_income_kpremium_userfamily_planrecent_complaints...m_obs_d_1tau_link_d_1m_d_2m_obs_d_2tau_link_d_2g_d_0g_d_1g_d_2cate_d_1cate_d_2
01.01.00.00.027.6566052.64900082.5570460.01.00.0...0.222533-0.4102690.2212260.2212260.6210910.4628330.3637300.615892-0.0991030.153059
11.01.00.00.023.7983862.77181188.5513690.00.02.0...0.206692-0.4675890.2433570.2433570.5530280.4671790.3545580.603855-0.1126210.136676
21.00.00.01.028.4250092.79386488.6971760.00.00.0...0.201852-0.4207760.2127480.2127480.5843680.5515210.4467130.688086-0.1048070.136565
31.01.00.00.018.8600663.30338179.5298900.00.01.0...0.197248-0.4549760.2411110.2411110.5835340.5230980.4103500.662844-0.1127480.139746
40.00.01.00.017.8530872.60505677.2348370.00.00.0...0.219363-0.4107190.2400040.2400040.5793750.5812900.4793500.712477-0.1019400.131187

5 rows × 26 columns

Result

Ground truth ATE for d_1 vs d_0 is -0.11659127048512116 Ground truth ATE for d_2 vs d_0 is 0.1380607452020492

Result

MultiCausalData(df=(100000, 12), treatment_names=['d_0', 'd_1', 'd_2'], control_treatment='d_0')outcome='y', confounders=['tenure_months', 'weekly_active_days', 'annual_income_k', 'premium_user', 'family_plan', 'recent_complaints', 'discount_eligible', 'engagement_score'], user_id=None,

Result
treatmentcountmeanstdminp10p25medianp75p90max
0d_0501620.5428810.4981630.00.00.01.01.01.01.0
1d_2250540.6903890.4623430.00.00.01.01.01.01.0
2d_1247840.4395170.4963380.00.00.00.01.01.01.0
Result

png

Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0premium_user0.1899450.2408800.0509350.1241340.00000
1weekly_active_days3.9031704.0137470.1105770.0753430.00000
2recent_complaints0.7793950.8451350.0657400.0727840.00000
3discount_eligible0.2791160.3055800.0264640.0582070.00000
4family_plan0.3629240.3901970.0272730.0563100.00000
5annual_income_k70.99066772.1512211.1605540.0324990.00069
6tenure_months23.74782123.4485860.299235-0.0256480.00111
7engagement_score0.5997280.6023760.0026490.0222700.01979
Result
confoundersmean_d_0mean_d_1abs_diffsmdks_pvalue
0premium_user0.1899450.2635570.0736130.1764800.00000
1weekly_active_days3.9031704.1575510.2543810.1740180.00000
2tenure_months23.74782125.5499901.8021690.1531200.00000
3discount_eligible0.2791160.3353780.0562620.1221760.00000
4family_plan0.3629240.4125650.0496400.1020130.00000
5annual_income_k70.99066773.5168832.5262160.0701220.00000
6recent_complaints0.7793950.8052370.0258420.0288930.01295
7engagement_score0.5997280.5981920.001535-0.0128940.29411