Skip to the content.

Seed Population Creation Step

This step generates the synthetic seed population (households and persons) for the Bay Area PopulationSim model, using harmonized and filtered PUMS data. The output is used as the starting point for synthetic population generation and model calibration.

What This Step Does

Inputs

Outputs

How to Run

From the bay_area directory, run:

python create_seed_population_tm2_refactored.py

This will generate the seed population files in the configured output directory.

Group Quarters Handling

Important Policy Change: As of October 2025, this script implements a person-level group quarters approach with two-stage assignment and person-level gq_type field for PopulationSim person-level controls.

TYPEHUGQ-Based Exclusion Policy

The script uses the PUMS TYPEHUGQ variable for group quarters classification:

Two-Stage GQ Assignment Process

Due to PUMS data limitations, we use a sophisticated two-stage approach:

Stage 1 - Household Level Processing:

Stage 2 - Person Level Refinement:

Person-Level gq_type Field

For PopulationSim person-level controls, a gq_type field is created on each person record:

This field enables PopulationSim to use person-level control expressions like persons.gq_type==1 that directly match Census person-level GQ data structure.

Final PopulationSim Structure

For PopulationSim balancing:

Output TYPE Field Creation

For travel model compatibility, a collapsed TYPEHUGQ field is created:

This ensures the final TYPE field in travel model outputs correctly represents the original PUMS categories while allowing person-level detailed balancing during synthesis.

Rationale: This approach uses person-level controls that align directly with Census P5 series data structure, eliminating household-level conversion assumptions while maintaining PopulationSim convergence and travel model compatibility.

Notes


Return to the main documentation index for other pipeline steps.