generate_obs_hte_26()
Mathematical Specification of generate_obs_hte_26()
The generate_obs_hte_26() function generates an observational dataset with nonlinear outcome and treatment assignment mechanisms, along with heterogeneous treatment effects. The data generation process (DGP) is defined as follows:
1. Confounders ()
The dataset contains five confounders :
- :
tenure_months - :
avg_sessions_week - :
spend_last_month - :
premium_user - :
urban_resident
Base Features Sampling: The base features are sampled using a Gaussian Copula to introduce correlations while preserving specific marginal distributions. The correlation matrix for the underlying Gaussian variables is:
The marginal distributions are:
- , clipped at .
- , clipped at .
- , clipped at .
- .
Derived Feature ():
The feature (premium_user) is generated based on a logistic model of the other features:
2. Treatment Assignment ()
The treatment is assigned using a propensity score : where:
- is the sigmoid function.
- limits the propensity score range to ensure positivity (here ).
- is a calibration constant chosen such that the overall treatment rate is approximately 35%.
- The score is defined as:
3. Heterogeneous Treatment Effect ()
The treatment effect (CATE) is nonlinear and depends on the confounders:
4. Outcome Model ()
The outcome is a continuous variable generated as: The baseline outcome function includes linear and nonlinear terms: where: