Skip to the content.

Control Generation Step: Creating Baseyear Control Files

This step generates the baseyear control files required for the Bay Area PopulationSim model, using ACS 2023 and 2020 Decennial Census data. Controls are produced at the MAZ, TAZ, and county levels, and are used to guide the synthetic population generation process.

What This Step Does

Group Quarters Processing (Updated October 2025)

Important Change: Group quarters controls use person-level controls aligned with Census data structure to ensure data consistency and improve PopulationSim convergence.

Background

Census provides group quarters data at the person level (P5 series tables), while PopulationSim can handle both household-level and person-level controls. The system now uses person-level GQ controls to directly match Census data structure, eliminating conversion assumptions and improving accuracy.

Person-Level Group Quarters Approach

Control Structure (Person Level):

Census Data Sources:

Final Group Quarters Inclusion Policy

Person-Level Control Structure

Person-level controls count individuals directly from Census data:

Household Count Integration

The numhh_gq control combines:

This approach treats each GQ person as representing potential housing demand while maintaining person-level control accuracy.

Column Naming Standards

Geographic Column Naming Convention

Standardized Column Names:

Legacy Column Names (Deprecated):

Control File Column Structure

MAZ Controls (maz_marginals.csv and maz_marginals_hhgq.csv):

TAZ Controls (taz_marginals.csv and taz_marginals_hhgq.csv):

County Controls (county_marginals.csv):

Geographic Crosswalk (geo_cross_walk_tm2_maz.csv):

Column Naming Migration (October 2025)

What Changed: The system was updated to use consistent MAZ_NODE/TAZ_NODE naming throughout all geographic crosswalk files. The rebuild_maz_taz_all_geog_file() function in tm2_control_utils/config_census.py was updated to ensure consistent column naming.

Migration Impact:

Validation: The mazs_tazs_all_geog.csv crosswalk file was rebuilt with 109,228 records using the new naming convention, ensuring all geographic operations use consistent identifiers.

Group Quarters Control Integration (October 2025)

Military GQ Combination: As of October 2025, military group quarters persons are automatically combined into the “other noninstitutional” category to match the seed population encoding structure:

Processing Steps:

  1. Generate separate military and other noninstitutional GQ controls from Census P5 data
  2. Validate each control category individually
  3. Combine military into other noninstitutional to match seed population structure
  4. Create HHGQ-integrated files for PopulationSim consumption
  5. Clean up intermediate files to maintain organized workflow

This ensures the control structure exactly matches the seed population GQ encoding while preserving the underlying Census data accuracy.

Column Naming Quick Reference

Geography Level File Key ID Column Standard Name Legacy Name
MAZ maz_marginals_hhgq.csv MAZ identifier MAZ MAZ
TAZ taz_marginals_hhgq.csv TAZ identifier TAZ TAZ
County county_marginals.csv County identifier COUNTY N/A
Crosswalk geo_cross_walk_tm2_maz.csv MAZ identifier MAZ_NODE MAZ
Crosswalk geo_cross_walk_tm2_maz.csv TAZ identifier TAZ_NODE TAZ

Important:

Inputs

Outputs

PopulationSim Input Files (Primary)

Supporting Files

File Processing Notes

How to Run

From the bay_area directory, run:

python create_baseyear_controls_23_tm2.py

This will generate all control and summary files in the configured output directory.

Notes


Return to the main documentation index for other pipeline steps.