State-of-the-art robust causal inference for experiments and observational data in python

Estimate heterogeneous treatment effects with confidence intervals. Make million dollar decisions

USE CASES

Built for every modern causal challenge

AB tests

Design, monitor, and analyze randomized controlled trials

Quasi-Experiments and Observational Study

Estimate effects when there is no random assignment, using causal inference techniques

Robustness and Accuracy

Check robustness and accuracy to make your decision trustworthy

You don’t choose methods — you choose a scenario. Causalis selects a best-practice default.

Each scenario maps to a modern estimator with defensible inference, implemented through a consistent fit() → estimate() API.

Backed by the literature

Causal ML
Chernozhukov et al. (2018) — DML/IRM
Bach et al. (2024) — DoubleML
Experiments
Deng et al. (2013) — CUPED
Lin (2013) — Interacted ANCOVA
Robust inference
Welch (1947) — Welch t-test
Efron (1979) — Bootstrap

Guardrails included: SRM checks (Fabijan et al., 2019) and sensitivity analysis (Chernozhukov et al., 2024).

Engineering

Production-ready by design

Causalis is built like a real analytics system: typed data contracts, test coverage, deterministic estimators, and performance defaults that scale.

Typed data contracts (Pydantic)

Validate schema, types, nullability, and column roles before any estimator runs.

CausalData enforces outcome/treatment/confounders

Numpy-style docstrings

Consistent API docs with assumptions, inputs, outputs, and failure modes.

Docstring-first public API

Tested estimators

Unit + integration tests over synthetic DGPs to prevent silent regressions.

Deterministic seeds + known ground truth

Fast inference paths

Approximation options when bootstrap is too expensive at scale.

Analytic / IF-based SEs where appropriate

Battle-tested ML defaults

Strong out-of-the-box nuisance models for modern causal ML workflows.

CatBoost default for DML/IRM

Fail-fast diagnostics

Guardrails that catch common experiment and data issues early.

SRM, balance checks, sensitivity hooks

Typed contracts. Tested estimators. Fast uncertainty. Strong defaults.

Benchmarks

DGPs you can trust

Every scenario ships with synthetic generators that encode the causal effect—so you can validate estimators against ground truth, test uncertainty, and benchmark robustness.

Level 0
Sanity checks

Minimal DGP to verify pipelines and baseline estimators end-to-end.

No confounding • constant effect

Level 1
Experimental realism

Prognostic signal + optional y_pre to benchmark CUPED / Lin adjustment.

add_pre • pre_corr • prognostic_scale

Level 2
Observational

Confounding through X only: DML/IRM should recover the effect with valid inference.

beta_d ≠ 0 • beta_y ≠ 0 • u_strength_* = 0

Level 3
Modern complexity

Copula-correlated X, nonlinearities, heterogeneity, multiple outcome families.

use_copula • g_y/g_d • tau(X) • binary/poisson

Level 4
Stress & failure

Assumptions violated on purpose to show where identification breaks.

u_strength_d ≠ 0 AND u_strength_y ≠ 0 → guarded

Oracle Columns

m, g0, g1, cate

Outcome Families

continuous / binary / poisson

Positivity Control

propensity_sharpness

Run your first notebook

Result

Ground truth ATE is 0.9509353818962034

Result
ydtenure_monthsavg_sessions_weekspend_last_monthpremium_userurban_resident
0-1.9838951.028.8146541.084.1007611.00.0
17.5271260.07.4441810.030.8908470.01.0
26.6968421.023.7592792.093.6931800.00.0
310.3371610.024.9699299.0127.9749780.01.0
46.0719550.029.9432612.096.9985390.01.0
Result
value
field
estimandATE
modelIRM
value0.9788 (ci_abs: 0.8074, 1.1503)
value_relative40.3375 (ci_rel: 33.2610, 47.4139)
alpha0.0500
p_value0.0000
is_significantTrue
n_treated3454
n_control6546
treatment_mean3.6747
control_mean2.3230
time2026-02-08

Scenarios