Existing Weights
This path skips the compute-weighting pipeline and instead attaches externally produced weights to the canonical survey tables.
Use this path when you already have weights from another system, a prior weighting run, or an external model workflow.
Core behavior:
- load weight CSVs keyed to canonical IDs
- join the weights to the corresponding tables
- optionally derive missing downstream weights through the survey hierarchy
- validate that the provided weight set is internally consistent
processing.weighting.existing_weights
Attach pre-computed weights to survey data tables.
This module provides the add_existing_weights pipeline step, which loads
weight CSV files and joins them to the corresponding canonical tables.
Optionally, it can derive missing weights by propagating values through the
survey hierarchy.
Core algorithm
Phase 1 -- Load and Join Weights
-
For each provided weight config:
- Validate the config key against allowed table types.
- Load the weight CSV from
weight_path. - Validate required ID and weight columns exist.
- Handle ID column name mismatches (rename if needed).
- Left-join weights to the table on the ID column.
Phase 2 -- Derive Missing Weights (when derive_missing_weights=True)
- Hierarchical carry-forward for household → person → day → unlinked_trip: validate parent has weights, then join parent weight to child via FK.
- Aggregated weights for linked_trip, joint_trip, tour: compute mean weight per group (excluding nulls and zeros), then left-join.
Gap detection: raises an error if a middle-tier weight is missing (e.g. household + trip weights provided but person/day weights are not).
ExistingWeightConfig
pydantic-model
Configuration for a single weight file.
Most fields use sensible defaults - typically you only need to provide weight_path. Override weight_id_col only if your weight file uses different ID column names. Override weight_col only if your weight file uses different weight column names.
Examples:
Basic usage with defaults:
>>> WeightConfig(config_key="household_weights", weight_path="hh_weights.csv")
# Uses: weight_id_col="hh_id", weight_col="hh_weight"
Override weight column name:
>>> WeightConfig(
... config_key="person_weights",
... weight_path="person_weights.csv",
... weight_col="final_weight" # Weight file uses "final_weight" not "person_weight"
... )
Override ID column name (external weight file with non-canonical IDs):
>>> WeightConfig(
... config_key="unlinked_trip_weights",
... weight_path="trip_weights.csv",
... weight_id_col="trip_id", # Weight file uses "trip_id" not "unlinked_trip_id"
... weight_col="trip_weight" # Weight file uses "trip_weight" not "unlinked_trip_weight"
... )
Attributes:
| Name | Type | Description |
|---|---|---|
config_key |
str
|
Key identifying the weight type (e.g., 'household_weights') |
weight_path |
Path
|
Path to the CSV file containing weights |
weight_id_col |
str | None
|
ID column name in the weight file. Defaults to canonical table ID. |
weight_col |
str | None
|
Weight column name in the weight file. Defaults to canonical weight column. |
Fields:
-
config_key(str) -
weight_path(Path) -
weight_id_col(str | None) -
weight_col(str | None) -
keep_name(bool)
Validators:
-
validate_config_key→config_key -
validate_path_exists→weight_path -
set_defaults_from_mapping
add_existing_weights
add_existing_weights(
weights: dict[str, ExistingWeightConfig | dict],
derive_missing_weights: bool = False,
households: pl.DataFrame | None = None,
persons: pl.DataFrame | None = None,
days: pl.DataFrame | None = None,
unlinked_trips: pl.DataFrame | None = None,
linked_trips: pl.DataFrame | None = None,
tours: pl.DataFrame | None = None,
joint_trips: pl.DataFrame | None = None,
) -> dict[str, pl.DataFrame]
Attach existing weights to the data.
Loads weights from provided files and attaches them to the data.
For any tables that exist, and do not have weights provided, we can optionally derive missing weights by carrying forward weights from the next logical upstream table.
For example, if only household weights exist, all subsequent tables (persons, days, trips) will receive the household weight for each member record. If trip (unlinked) weights exist but not linked trips or tours, the unlinked trip weights will be carried forward using appropriate logic.
If a "middle" weight is missing, an error will be raised if derive_missing_weights is True. For example, if household and trip weights are provided, but not person or day weights, an error will be raised as this likely indicates a misconfiguration.
Weight hierarchy logic
hh_weight
└─ person_weight (carry forward via hh_id)
└─ day_weight (carry forward via person_id)
└─ unlinked_trip_weight (carry forward via day_id)
├─ linked_trip_weight (mean agg via linked_trip_id)
├─ joint_trip_weight (mean agg via joint_trip_id)
└─ tour_weight (mean agg via tour_id)
Note that if there are no "adjustments" made to sub-table weights (e.g., person or trip), then all weights should actually be exactly same from household through tour.
If sub-table weights do vary, a checksum can validate integrity:
sum(person_weight) ≈ sum(hh_weight x num_persons)sum(day_weight) ≈ sum(person_weight x num_complete_days)sum(unlinked_trip_weight) ≈ sum(day_weight x num_trips)sum(linked_trip_weight) ≈ sum(unlinked_trip_weight)sum(tour_weight) ≈ sum(linked_trip_weight)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weights
|
dict[str, ExistingWeightConfig | dict]
|
A dict mapping config keys to weight file paths.
Each entry specifies a weight CSV to load. Supported config
keys: Config options per entry:
|
required |
derive_missing_weights
|
bool
|
Whether to derive weights for tables without provided weight files (default: False). |
False
|
households
|
pl.DataFrame | None
|
Households DataFrame. |
None
|
persons
|
pl.DataFrame | None
|
Persons DataFrame. |
None
|
days
|
pl.DataFrame | None
|
Days DataFrame. |
None
|
unlinked_trips
|
pl.DataFrame | None
|
Unlinked trips DataFrame. |
None
|
linked_trips
|
pl.DataFrame | None
|
Linked trips DataFrame. |
None
|
tours
|
pl.DataFrame | None
|
Tours DataFrame. |
None
|
joint_trips
|
pl.DataFrame | None
|
Joint trips DataFrame. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, pl.DataFrame]
|
Dict of tables with attached weights. |
Example config
- name: add_existing_weights
params:
derive_missing_weights: true
weights:
household_weights:
weight_path: "weights/hh_weights.csv"
# defaults: id_col='hh_id', weight_col='hh_weight'
person_weights:
weight_path: "weights/person_weights.csv"
unlinked_trip_weights:
weight_path: "weights/trip_weights.csv"
weight_id_col: "trip_id"
weight_col: "trip_weight"