Existing Weights

This path skips the compute-weighting pipeline and instead attaches externally produced weights to the canonical survey tables.

Use this path when you already have weights from another system, a prior weighting run, or an external model workflow.

Core behavior:

load weight CSVs keyed to canonical IDs
join the weights to the corresponding tables
optionally derive missing downstream weights through the survey hierarchy
validate that the provided weight set is internally consistent

processing.weighting.existing_weights

Attach pre-computed weights to survey data tables.

This module provides the add_existing_weights pipeline step, which loads weight CSV files and joins them to the corresponding canonical tables. Optionally, it can derive missing weights by propagating values through the survey hierarchy.

Core algorithm

Phase 1 -- Load and Join Weights

For each provided weight config:
1. Validate the config key against allowed table types.
2. Load the weight CSV from weight_path.
3. Validate required ID and weight columns exist.
4. Handle ID column name mismatches (rename if needed).
5. Left-join weights to the table on the ID column.

Phase 2 -- Derive Missing Weights (when `derive_missing_weights=True`)

Hierarchical carry-forward for household → person → day → unlinked_trip: validate parent has weights, then join parent weight to child via FK.
Aggregated weights for linked_trip, joint_trip, tour: compute mean weight per group (excluding nulls and zeros), then left-join.

Gap detection: raises an error if a middle-tier weight is missing (e.g. household + trip weights provided but person/day weights are not).

ExistingWeightConfig `pydantic-model`

Configuration for a single weight file.

Most fields use sensible defaults - typically you only need to provide weight_path. Override weight_id_col only if your weight file uses different ID column names. Override weight_col only if your weight file uses different weight column names.

Examples:

Basic usage with defaults:

>>> WeightConfig(config_key="household_weights", weight_path="hh_weights.csv")
# Uses: weight_id_col="hh_id", weight_col="hh_weight"

Override weight column name:

>>> WeightConfig(
...     config_key="person_weights",
...     weight_path="person_weights.csv",
...     weight_col="final_weight"  # Weight file uses "final_weight" not "person_weight"
... )

Override ID column name (external weight file with non-canonical IDs):

>>> WeightConfig(
...     config_key="unlinked_trip_weights",
...     weight_path="trip_weights.csv",
...     weight_id_col="trip_id",  # Weight file uses "trip_id" not "unlinked_trip_id"
...     weight_col="trip_weight" # Weight file uses "trip_weight" not "unlinked_trip_weight"
... )

Attributes:

Name	Type	Description
`config_key`	`str`	Key identifying the weight type (e.g., 'household_weights')
`weight_path`	`Path`	Path to the CSV file containing weights
`weight_id_col`	`str \| None`	ID column name in the weight file. Defaults to canonical table ID.
`weight_col`	`str \| None`	Weight column name in the weight file. Defaults to canonical weight column.

Fields:

config_key (str)
weight_path (Path)
weight_id_col (str | None)
weight_col (str | None)
keep_name (bool)

Validators:

validate_config_key → config_key
validate_path_exists → weight_path
set_defaults_from_mapping

add_existing_weights

add_existing_weights(
    weights: dict[str, ExistingWeightConfig | dict],
    derive_missing_weights: bool = False,
    households: pl.DataFrame | None = None,
    persons: pl.DataFrame | None = None,
    days: pl.DataFrame | None = None,
    unlinked_trips: pl.DataFrame | None = None,
    linked_trips: pl.DataFrame | None = None,
    tours: pl.DataFrame | None = None,
    joint_trips: pl.DataFrame | None = None,
) -> dict[str, pl.DataFrame]

Attach existing weights to the data.

Loads weights from provided files and attaches them to the data.

For any tables that exist, and do not have weights provided, we can optionally derive missing weights by carrying forward weights from the next logical upstream table.

For example, if only household weights exist, all subsequent tables (persons, days, trips) will receive the household weight for each member record. If trip (unlinked) weights exist but not linked trips or tours, the unlinked trip weights will be carried forward using appropriate logic.

If a "middle" weight is missing, an error will be raised if derive_missing_weights is True. For example, if household and trip weights are provided, but not person or day weights, an error will be raised as this likely indicates a misconfiguration.

Weight hierarchy logic

hh_weight
  └─ person_weight        (carry forward via hh_id)
      └─ day_weight        (carry forward via person_id)
          └─ unlinked_trip_weight  (carry forward via day_id)
              ├─ linked_trip_weight   (mean agg via linked_trip_id)
              ├─ joint_trip_weight    (mean agg via joint_trip_id)
              └─ tour_weight          (mean agg via tour_id)

Note that if there are no "adjustments" made to sub-table weights (e.g., person or trip), then all weights should actually be exactly same from household through tour.

If sub-table weights do vary, a checksum can validate integrity:

sum(person_weight) ≈ sum(hh_weight x num_persons)
sum(day_weight) ≈ sum(person_weight x num_complete_days)
sum(unlinked_trip_weight) ≈ sum(day_weight x num_trips)
sum(linked_trip_weight) ≈ sum(unlinked_trip_weight)
sum(tour_weight) ≈ sum(linked_trip_weight)

Parameters:

Name	Type	Description	Default
`weights`	`dict[str, ExistingWeightConfig \| dict]`	A dict mapping config keys to weight file paths. Each entry specifies a weight CSV to load. Supported config keys: `household_weights`, `person_weights`, `day_weights`, `unlinked_trip_weights`, `linked_trip_weights`, `joint_trip_weights`, `tour_weights`. Config options per entry: `weight_path`: Path to CSV file containing weights (required). `weight_id_col`: ID column name in the weight file (optional, defaults to canonical table ID). `weight_col`: Weight column name in the weight file (optional, defaults to canonical weight column).	required
`derive_missing_weights`	`bool`	Whether to derive weights for tables without provided weight files (default: False).	`False`
`households`	`pl.DataFrame \| None`	Households DataFrame.	`None`
`persons`	`pl.DataFrame \| None`	Persons DataFrame.	`None`
`days`	`pl.DataFrame \| None`	Days DataFrame.	`None`
`unlinked_trips`	`pl.DataFrame \| None`	Unlinked trips DataFrame.	`None`
`linked_trips`	`pl.DataFrame \| None`	Linked trips DataFrame.	`None`
`tours`	`pl.DataFrame \| None`	Tours DataFrame.	`None`
`joint_trips`	`pl.DataFrame \| None`	Joint trips DataFrame.	`None`

Returns:

Type	Description
`dict[str, pl.DataFrame]`	Dict of tables with attached weights.

Example config

    - name: add_existing_weights
      params:
        derive_missing_weights: true
        weights:
          household_weights:
            weight_path: "weights/hh_weights.csv"
            # defaults: id_col='hh_id', weight_col='hh_weight'
          person_weights:
            weight_path: "weights/person_weights.csv"
          unlinked_trip_weights:
            weight_path: "weights/trip_weights.csv"
            weight_id_col: "trip_id"
            weight_col: "trip_weight"