Skip to content
Scenario1 min read

CausalData at a glance

CausalData is the light-weight input container used across Causalis. It wraps a pandas DataFrame and records which columns are the outcome, the treatment, and t...

CausalData at a glance

CausalData is the light-weight input container used across Causalis. It wraps a pandas DataFrame and records which columns are the outcome, the treatment, and the confounders.

Quick start

Note: Internally, the stored DataFrame is trimmed to only these columns: [outcome, treatment, confounders].

API essentials

  • Init parameters

    • df: pandas DataFrame (no NaNs)
    • treatment: name of the treatment column (numeric)
    • outcome: name of the outcome column (numeric)
    • confounders: one or more confounder column names (numeric)
  • Properties

    • outcome: pandas Series
    • treatment: pandas Series
    • confounders: list[str] of confounder column names
  • Method

    • get_df(columns=None, include_treatment=True, include_outcome=True, include_confounders=True) -> DataFrame Selects columns by name and/or by role. Returns a copy.

Validation (on construction)

  1. No missing values anywhere in df.
  2. All referenced columns must exist.
  3. Outcome, treatment, and confounders must be numeric (int/float).
  4. None of these columns can be constant (zero variance).
  5. Any two used columns having identical values is disallowed (raises ValueError).
  6. Duplicate rows across the used columns trigger a warning (not an error).

Common snippets

Tips

  • For categorical confounders, encode them numerically (e.g., one-hot) before creating CausalData.
  • If you see the duplicate-rows warning, consider deduplicating if duplicates are unintended.
  • repr shows the stored shape and declared roles for quick inspection.