Weighting

Overview

Weighting has two mutually exclusive options:

flowchart TD
  A["Weighting"] --> B["Compute Weights\nDerive new weights from controls"]
  A --> C["Existing Weights\nAttach pre-computed weights"]

Compute Weights computes new weights from PUMS controls and survey seed data.
Existing Weights attaches weights that were already computed elsewhere.

Only the compute weights option needs the full weighting pipeline machinery such as PUMS fetching, crosswalk construction, incidence preparation, control aggregation, balancing, diagnostics, and control validation. The existing weights option is much lighter: it joins external weight files onto canonical tables and can optionally derive missing downstream weights through the survey hierarchy.

Choose an Option

Option	Use when	Key inputs	Main output
Compute Weights	You need to create expansion weights from controls	PUMS, geography, control definitions, survey tables	New household weights propagated to all tables
Existing Weights	You already have weight files from another system or prior run	Weight CSVs keyed to canonical IDs	Existing weights attached and optionally propagated

processing.weighting

Survey weighting module.

This module provides two pipeline options for attaching expansion weights to survey tables:

add_existing_weights -- load pre-computed weights from CSV files and join them to tables, optionally deriving missing weights by propagating values through the survey hierarchy.
compute_weights -- compute weights from scratch using PUMS / ACS microdata as population controls via maximum-entropy balancing.

The compute_weights step orchestrates the full pipeline — see compute_weights.py for the detailed step-by-step description.

Weight hierarchy

hh_weight
  └─ person_weight        (carry forward via hh_id)
      └─ day_weight        (carry forward via person_id)
          └─ unlinked_trip_weight  (carry forward via day_id)
              ├─ linked_trip_weight   (mean agg via linked_trip_id)
              ├─ joint_trip_weight    (mean agg via joint_trip_id)
              └─ tour_weight          (mean agg via tour_id)

Module structure

weighting/
├── existing_weights.py       # attach pre-computed weights
├── compute_weights.py        # single @step() entry point for full weighting pipeline
├── weighting_pipeline.py     # internal class orchestrating the weighting process
├── controls/                 # control variable definitions & registry
├── data_prep/                # PUMS I/O, control totals, survey seed, geography
├── balancing/                # balancer, base weights, propagation, importance
├── diagnostics/              # interactive HTML report (Plotly + Jinja2)
└── validation/               # post-balancing sanity checks