State-of-the-art robust causal inference for experiments and observational data in python

Estimate heterogeneous treatment effects with confidence intervals. Make million dollar decisions

Run Your First Notebook Scenarios

USE CASES

Built for every modern causal challenge

AB tests

Design, monitor, and analyze randomized controlled trials

Quasi-Experiments and Observational Study

Estimate effects when there is no random assignment, using causal inference techniques

Robustness and Accuracy

Check robustness and accuracy to make your decision trustworthy

You don’t choose methods — you choose a scenario. Causalis selects a best-practice default.

Each scenario maps to a modern estimator with defensible inference, implemented through a consistent fit() → estimate() API.

Backed by the literature

Causal ML

Chernozhukov et al. (2018) — DML/IRM

Bach et al. (2024) — DoubleML

Experiments

Deng et al. (2013) — CUPED

Lin (2013) — Interacted ANCOVA

Robust inference

Welch (1947) — Welch t-test

Efron (1979) — Bootstrap

Guardrails included: SRM checks (Fabijan et al., 2019) and sensitivity analysis (Chernozhukov et al., 2024).

Engineering

Production-ready by design

Causalis is built like a real analytics system: typed data contracts, test coverage, deterministic estimators, and performance defaults that scale.

Typed data contracts (Pydantic)

Validate schema, types, nullability, and column roles before any estimator runs.

CausalData enforces outcome/treatment/confounders

Numpy-style docstrings

Consistent API docs with assumptions, inputs, outputs, and failure modes.

Docstring-first public API

Tested estimators

Unit + integration tests over synthetic DGPs to prevent silent regressions.

Deterministic seeds + known ground truth

Fast inference paths

Approximation options when bootstrap is too expensive at scale.

Analytic / IF-based SEs where appropriate

Battle-tested ML defaults

Strong out-of-the-box nuisance models for modern causal ML workflows.

CatBoost default for DML/IRM

Fail-fast diagnostics

Guardrails that catch common experiment and data issues early.

SRM, balance checks, sensitivity hooks

Typed contracts. Tested estimators. Fast uncertainty. Strong defaults.

Benchmarks

DGPs you can trust

Every scenario ships with synthetic generators that encode the causal effect—so you can validate estimators against ground truth, test uncertainty, and benchmark robustness.

Level 0

Sanity checks

Minimal DGP to verify pipelines and baseline estimators end-to-end.

No confounding • constant effect

Level 1

Experimental realism

Prognostic signal + optional y_pre to benchmark CUPED / Lin adjustment.

add_pre • pre_corr • prognostic_scale

Level 2

Observational

Confounding through X only: DML/IRM should recover the effect with valid inference.

beta_d ≠ 0 • beta_y ≠ 0 • u_strength_* = 0

Level 3

Modern complexity

Copula-correlated X, nonlinearities, heterogeneity, multiple outcome families.

use_copula • g_y/g_d • tau(X) • binary/poisson

Level 4

Stress & failure

Assumptions violated on purpose to show where identification breaks.

u_strength_d ≠ 0 AND u_strength_y ≠ 0 → guarded

Oracle Columns

m, g0, g1, cate

Outcome Families

continuous / binary / poisson

Positivity Control

propensity_sharpness

Run your first notebook

Result

Ground truth ATE is 0.9509353818962034

Result

	y	d	tenure_months	avg_sessions_week	spend_last_month	premium_user	urban_resident
0	-1.983895	1.0	28.814654	1.0	84.100761	1.0	0.0
1	7.527126	0.0	7.444181	0.0	30.890847	0.0	1.0
2	6.696842	1.0	23.759279	2.0	93.693180	0.0	0.0
3	10.337161	0.0	24.969929	9.0	127.974978	0.0	1.0
4	6.071955	0.0	29.943261	2.0	96.998539	0.0	1.0

Result

	value
field
estimand	ATE
model	IRM
value	0.9935 (ci_abs: 0.8185, 1.1685)
value_relative	41.2598 (ci_rel: 33.0465, 49.4732)
alpha	0.0500
p_value	0.0000
is_significant	True
n_treated	3454
n_control	6546
treatment_mean	3.6747
control_mean	2.3230
time	2026-04-01

State-of-the-art robust causal inference for experiments and observational data in python

Built for every modern causal challenge

AB tests

Quasi-Experiments and Observational Study

Robustness and Accuracy

You don’t choose methods — you choose a scenario. Causalis selects a best-practice default.

Production-ready by design

DGPs you can trust

Run your first notebook

Scenarios

Introduction to Causal Inference

Classic RCT

CUPED

Unconfoundedness

Instrumental Variables

GATE

Multi Unconfoundedness

Synthetic Control

Difference in Difference

Uplift

State-of-the-art robust causal inference for experiments and observational data in python

Built for every modern causal challenge

AB tests

Quasi-Experiments and Observational Study

Robustness and Accuracy

You don’t choose methods — you choose a scenario. Causalis selects a best-practice default.

Production-ready by design

DGPs you can trust

Run your first notebook

Scenarios

00Introduction to Causal Inference

Introduction to Causal Inference

01Classic RCT

Classic RCT

02CUPED

CUPED

03Unconfoundedness

Unconfoundedness

04Instrumental VariablesNew

Instrumental Variables

05GATE

GATE

06Multi Unconfoundedness

Multi Unconfoundedness

07Synthetic Control

Synthetic Control

08Difference in DifferenceNew

Difference in Difference

09UpliftNew

Uplift