Weighting
Overview
Weighting has two mutually exclusive options:
flowchart TD
A["Weighting"] --> B["Compute Weights\nDerive new weights from controls"]
A --> C["Existing Weights\nAttach pre-computed weights"]
- Compute Weights computes new weights from PUMS controls and survey seed data.
- Existing Weights attaches weights that were already computed elsewhere.
Only the compute weights option needs the full weighting pipeline machinery such as PUMS fetching, crosswalk construction, incidence preparation, control aggregation, balancing, diagnostics, and control validation. The existing weights option is much lighter: it joins external weight files onto canonical tables and can optionally derive missing downstream weights through the survey hierarchy.
Choose an Option
| Option | Use when | Key inputs | Main output |
|---|---|---|---|
| Compute Weights | You need to create expansion weights from controls | PUMS, geography, control definitions, survey tables | New household weights propagated to all tables |
| Existing Weights | You already have weight files from another system or prior run | Weight CSVs keyed to canonical IDs | Existing weights attached and optionally propagated |
processing.weighting
Survey weighting module.
This module provides two pipeline options for attaching expansion weights to survey tables:
add_existing_weights-- load pre-computed weights from CSV files and join them to tables, optionally deriving missing weights by propagating values through the survey hierarchy.compute_weights-- compute weights from scratch using PUMS / ACS microdata as population controls via maximum-entropy balancing.
The compute_weights step orchestrates the full pipeline — see
compute_weights.py for the detailed step-by-step description.
Weight hierarchy
hh_weight
└─ person_weight (carry forward via hh_id)
└─ day_weight (carry forward via person_id)
└─ unlinked_trip_weight (carry forward via day_id)
├─ linked_trip_weight (mean agg via linked_trip_id)
├─ joint_trip_weight (mean agg via joint_trip_id)
└─ tour_weight (mean agg via tour_id)
Module structure
weighting/
├── existing_weights.py # attach pre-computed weights
├── compute_weights.py # single @step() entry point for full weighting pipeline
├── weighting_pipeline.py # internal class orchestrating the weighting process
├── controls/ # control variable definitions & registry
├── data_prep/ # PUMS I/O, control totals, survey seed, geography
├── balancing/ # balancer, base weights, propagation, importance
├── diagnostics/ # interactive HTML report (Plotly + Jinja2)
└── validation/ # post-balancing sanity checks