causaldata
Causalis Dataclass for storing Cross-sectional DataFrame and column metadata for causal inference.
Classes
- CausalData – Container for causal inference datasets.
CausalData
Bases: BaseModel
Container for causal inference datasets.
Wraps a pandas DataFrame and stores the names of treatment, outcome, and optional confounder columns. The stored DataFrame is restricted to only those columns. Uses Pydantic for validation and as a data_contracts contract.
Attributes
- df (
DataFrame) – The DataFrame containing the data_contracts restricted to outcome, treatment, and confounder columns. NaN values are not allowed in the used columns. - treatment_name (
str) – Column name representing the treatment variable. - outcome_name (
str) – Column name representing the outcome variable. - confounders_names (
List[str]) – Names of the confounder columns (may be empty). - user_id_name (
(str, optional)) – Column name representing the unique identifier for each observation/user.
Functions
X
Design matrix of confounders.
Returns
DataFrame– The DataFrame containing only confounder columns.
confounders
List of confounder column names.
Returns
confounders_names
df
from_df
Friendly constructor for CausalData.
Parameters
- df (
DataFrame) – The DataFrame containing the data_contracts. - treatment (
str) – Column name representing the treatment variable. - outcome (
str) – Column name representing the outcome variable. - confounders (
Union[str, List[str]]) – Column name(s) representing the confounders/covariates. - user_id (
str) – Column name representing the unique identifier for each observation/user. - **kwargs (
Any) – Additional arguments passed to the Pydantic model constructor.
Returns
CausalData– A validated CausalData instance.
get_df
Get a DataFrame with specified columns.
Parameters
- columns (
List[str]) – Specific column names to include. - include_treatment (
bool) – Whether to include the treatment column. - include_outcome (
bool) – Whether to include the outcome column. - include_confounders (
bool) – Whether to include confounder columns. - include_user_id (
bool) – Whether to include the user_id column.
Returns
DataFrame– A copy of the internal DataFrame with selected columns.
Raises
ValueError– If any specified columns do not exist.
model_config
outcome
Outcome column as a Series.
Returns
Series– The outcome column.
outcome_name
treatment
Treatment column as a Series.
Returns
Series– The treatment column.
treatment_name
user_id
user_id column as a Series.
Returns
Series– The user_id column.