Skip to content

Weighting

Overview

Weighting has two mutually exclusive options:

flowchart TD
  A["Weighting"] --> B["Compute Weights\nDerive new weights from controls"]
  A --> C["Existing Weights\nAttach pre-computed weights"]

Only the compute weights option needs the full weighting pipeline machinery such as PUMS fetching, crosswalk construction, incidence preparation, control aggregation, balancing, diagnostics, and control validation. The existing weights option is much lighter: it joins external weight files onto canonical tables and can optionally derive missing downstream weights through the survey hierarchy.

Choose an Option

Option Use when Key inputs Main output
Compute Weights You need to create expansion weights from controls PUMS, geography, control definitions, survey tables New household weights propagated to all tables
Existing Weights You already have weight files from another system or prior run Weight CSVs keyed to canonical IDs Existing weights attached and optionally propagated

processing.weighting

Survey weighting module.

This module provides two pipeline options for attaching expansion weights to survey tables:

  1. add_existing_weights -- load pre-computed weights from CSV files and join them to tables, optionally deriving missing weights by propagating values through the survey hierarchy.
  2. compute_weights -- compute weights from scratch using PUMS / ACS microdata as population controls via maximum-entropy balancing.

The compute_weights step orchestrates the full pipeline — see compute_weights.py for the detailed step-by-step description.

Weight hierarchy

hh_weight
  └─ person_weight        (carry forward via hh_id)
      └─ day_weight        (carry forward via person_id)
          └─ unlinked_trip_weight  (carry forward via day_id)
              ├─ linked_trip_weight   (mean agg via linked_trip_id)
              ├─ joint_trip_weight    (mean agg via joint_trip_id)
              └─ tour_weight          (mean agg via tour_id)

Module structure

weighting/
├── existing_weights.py       # attach pre-computed weights
├── compute_weights.py        # single @step() entry point for full weighting pipeline
├── weighting_pipeline.py     # internal class orchestrating the weighting process
├── controls/                 # control variable definitions & registry
├── data_prep/                # PUMS I/O, control totals, survey seed, geography
├── balancing/                # balancer, base weights, propagation, importance
├── diagnostics/              # interactive HTML report (Plotly + Jinja2)
└── validation/               # post-balancing sanity checks