Skip to the content.

TM1 vs TM2 PopulationSim Exploration Plan

Created: February 11, 2026
Purpose: Enumerate differences between TM1 and TM2 population synthesis approaches to inform potential refactoring


Executive Summary

This document compares the TM1 (Travel Model One) and TM2 (Travel Model Two) implementations of PopulationSim for the MTC Bay Area region. Both models target a 2023 base year, but have substantially different geographic structures, control hierarchies, and output requirements.

Key Differences at a Glance

Aspect TM1 (master branch) TM2 (tm2 branch)
Smallest Geography TAZ only MAZ (within TAZ)
Geographic Zones ~1,454 TAZs ~39,726 MAZs in ~5,000 TAZs
Control Levels COUNTY → PUMA → TAZ COUNTY → PUMA → TAZ_NODE → MAZ_NODE
Income Year Dollars 2000$ 2010$ (with 2023$ also available)
County Coding 1-9 (different order) 1-9 (SF=1, SM=2, etc.)
Code Structure Simpler, legacy scripts Modular config/pipeline

1. Geographic Structure

1.1 TM1 Geographies (master branch)

File: bay_area/hh_gq/data/geo_cross_walk_tm1.csv

TAZ,PUMA,COUNTY,county_name,REGION
1,7503,1,San Francisco,1
2,7503,1,San Francisco,1
...

1.2 TM2 Geographies (tm2 branch)

File: output_2023/populationsim_working_dir/data/geo_cross_walk_tm2_maz.csv

MAZ_NODE,TAZ_NODE,COUNTY,county_name,PUMA,...
10001,56,1,San Francisco,2204
10002,56,1,San Francisco,2204
...

1.3 County Coding Differences

TM1 County Codes (master branch):

COUNTY Name
1 San Francisco
2 San Mateo
3 Santa Clara
4 Alameda
5 Contra Costa
6 Solano
7 Napa
8 Sonoma
9 Marin

TM2 County Codes (tm2 branch) - uses FIPS-based coding:

COUNTY GEOID_county Name
1 06001 Alameda
13 06013 Contra Costa
41 06041 Marin
55 06055 Napa
75 06075 San Francisco
81 06081 San Mateo
85 06085 Santa Clara
95 06095 Solano
97 06097 Sonoma

⚠️ Key Issue: TM1 and TM2 use completely different county numbering schemes.


2. Control Variables (PopulationSim Marginals)

2.1 TM1 Controls

File: bay_area/hh_gq/configs_TM1/controls.csv

Control Geography Description
num_hh TAZ Total households (including GQ as 1-person HH)
hh_size_1_gq TAZ 1-person households (includes GQ)
hh_size_2 TAZ 2-person households
hh_size_3 TAZ 3-person households
hh_size_4_plus TAZ 4+ person households
hh_inc_30 TAZ Income ≤$30k (2000$)
hh_inc_30_60 TAZ Income $30-60k (2000$)
hh_inc_60_100 TAZ Income $60-100k (2000$)
hh_inc_100_plus TAZ Income >$100k (2000$)
hh_wrks_0 TAZ 0 workers
hh_wrks_1 TAZ 1 worker
hh_wrks_2 TAZ 2 workers
hh_wrks_3_plus TAZ 3+ workers
pers_age_00_04 TAZ Persons age 0-4
pers_age_05_19 TAZ Persons age 5-19
pers_age_20_44 TAZ Persons age 20-44
pers_age_45_64 TAZ Persons age 45-64
pers_age_65_plus TAZ Persons age 65+
gq_type_univ TAZ University GQ persons
gq_type_mil TAZ Military GQ persons
gq_type_othnon TAZ Other non-institutional GQ persons

Key Notes:

2.2 TM2 Controls (Current Implementation)

MAZ-Level Controls (maz_marginals_hhgq.csv):

Control Description
numhh_gq Total households + GQ (person-as-household approach)
total_pop Total population
hh_gq_university University GQ (each person = 1 household)
hh_gq_military Military GQ (each person = 1 household)
hh_gq_other_nonins Other non-institutional GQ

TAZ-Level Controls (taz_marginals_hhgq.csv):

Control Description
inc_lt_20k Income <$20k (2010$)
inc_20k_45k Income $20-45k (2010$)
inc_45k_60k Income $45-60k (2010$)
inc_60k_75k Income $60-75k (2010$)
inc_75k_100k Income $75-100k (2010$)
inc_100k_150k Income $100-150k (2010$)
inc_150k_200k Income $150-200k (2010$)
inc_200k_plus Income >$200k (2010$)
hh_wrks_0 through hh_wrks_3_plus Workers in household
pers_age_00_19 Persons age 0-19
pers_age_20_34 Persons age 20-34
pers_age_35_64 Persons age 35-64
pers_age_65_plus Persons age 65+
hh_kids_no Households without children
hh_kids_yes Households with children
hh_size_1 through hh_size_6_plus Household size distribution

COUNTY-Level Controls (county_marginals.csv):

Control Description
pers_occ_management Management occupations
pers_occ_professional Professional occupations
pers_occ_services Service occupations
pers_occ_retail Retail/sales occupations
pers_occ_manual Manual/production occupations
pers_occ_military Military occupations

2.3 Control Differences Summary

Aspect TM1 TM2
Finest Geography TAZ MAZ
Income Bins 4 bins ($30k, $60k, $100k) in 2000$ 8 bins (aligned to ACS B19001) in 2010$
Age Bins 0-4, 5-19, 20-44, 45-64, 65+ 0-19, 20-34, 35-64, 65+
Household Size At TAZ At TAZ (moved from MAZ)
Children Not controlled hh_kids_yes/no at TAZ
Occupation Disabled Active at COUNTY
GQ Approach Person counts at TAZ Person-as-household at MAZ

3. ACS/Census Tables Used

3.1 Common Tables (Both Models)

Table Description Usage
B01001 Sex by Age Age distribution controls
B08202 Workers in Household Worker controls
B11016 Household Size Household size controls
B19001 Household Income Income distribution controls

3.2 TM2-Specific Tables

Table Description Usage
B11005 Children in Household hh_kids_yes/no controls
C24010 Sex by Occupation Occupation controls at county
B23025 Employment Status Military occupation proxy
B25003 Tenure (ACS 1-year) County-level HH scaling targets
B01003 Total Population (ACS 1-year) County-level pop scaling
P1, H1 (Decennial 2020) Population/Housing counts Block-level MAZ controls
P5 (Decennial 2020 PL) Group Quarters by Type GQ controls

3.3 Census Geographies Required

Source TM1 TM2
Block (2020) Not used MAZ controls base
Block Group (ACS) Yes - aggregated to TAZ Yes - aggregated to MAZ/TAZ
Tract (ACS) Yes - aggregated to TAZ Yes - aggregated to TAZ
County (ACS 1-yr) Not clear Scaling targets

4. Output Files

4.1 Household Output Comparison

TM1 Households (synthetic_households_recode.csv):

Column Source Description
HHID unique_hh_id Household ID
TAZ TAZ TAZ location
hinccat1 Derived Income category 1-4
HINC hh_income_2000 Income in 2000 dollars
hworkers hh_workers_from_esr Number of workers
VEHICL VEH Vehicles
BLD BLD Building type
TEN TEN Tenure
PERSONS NP Number of persons
HHT HHT Household type
UNITTYPE TYPEHUGQ Unit type (HH vs GQ)
poverty_income_* Derived Poverty calculations
pct_of_poverty Derived Poverty percentage

TM2 Households (households_2023_tm2.csv):

Column Source Description
HHID unique_hh_id Household ID
TAZ_NODE TAZ_NODE TAZ location
MAZ_NODE MAZ_NODE MAZ location
MTCCountyID COUNTY County 1-9
HHINCADJ hh_income_2010 Income in 2010 dollars
NWRKRS_ESR hh_workers_from_esr Number of workers
VEH VEH Vehicles
TEN TEN Tenure
NP NP Number of persons
HHT HHT Household type
BLD BLD Building type
TYPE TYPEHUGQ Unit type

4.2 Person Output Comparison

TM1 Persons (synthetic_persons_recode.csv):

Column Source Description
HHID unique_hh_id Household ID
PERID Index + 1 Person ID
AGE AGEP Age
SEX SEX Sex
pemploy employ_status Employment status (1-4)
pstudent student_status Student status (1-3)
ptype person_type Person type (1-8)

TM2 Persons (persons_2023_tm2.csv):

Column Source Description
HHID unique_hh_id Household ID
PERID unique_per_id Person ID
AGEP AGEP Age
SEX SEX Sex
SCHL SCHL Educational attainment
OCCP occupation Occupation code
WKHP WKHP Hours worked per week
WKW WKW Weeks worked per year
EMPLOYED employed Employment flag 0/1
ESR ESR Employment status recode
SCHG SCHG Grade level attending
hhgqtype hhgqtype Group quarters type
person_type person_type Person type

4.3 Key Output Differences

Aspect TM1 TM2
Geography Columns TAZ only MAZ_NODE, TAZ_NODE, MAZ_SEQ, TAZ_SEQ
Income Dollar Year 2000$ 2010$
Person Type Definition Full CT-RAMP compatible (1-8) Simplified (employment-based)
Occupation Not in output OCCP code included
Education Not in output SCHL, SCHG included
Work Hours/Weeks Not in output WKHP, WKW included

5. Code Architecture Differences

5.1 TM1 Code Structure (master branch)

bay_area/
├── create_baseyear_controls.py    # Monolithic control generation
├── create_seed_population.py      # PUMS seed data prep
├── postprocess_recode.py          # Output formatting
├── run_populationsim.py           # Execution script
├── hh_gq/
│   ├── configs_TM1/
│   │   ├── controls.csv           # Control definitions
│   │   └── settings.yaml          # PopulationSim config
│   └── data/
│       ├── geo_cross_walk_tm1.csv # Geographic crosswalk
│       └── seed_households.csv    # PUMS seed data

5.2 TM2 Code Structure (tm2 branch)

bay_area/
├── tm2_config.py                  # Unified configuration
├── tm2_pipeline.py                # Full pipeline orchestration
├── create_baseyear_controls.py    # Control generation (uses config)
├── create_seed_population.py      # PUMS seed data prep
├── postprocess_recode.py          # Output formatting
├── utils/
│   ├── config_census.py           # Census table definitions, CONTROLS dict
│   ├── census_fetcher.py          # Census API client
│   ├── controls.py                # Control processing utilities
│   ├── geog_utils.py              # Geography utilities
│   └── tm2_utils.py               # Pipeline utilities
├── output_2023/
│   └── populationsim_working_dir/
│       ├── configs/
│       │   ├── controls.csv       # Generated control definitions
│       │   └── settings.yaml      # PopulationSim config
│       └── data/
│           ├── geo_cross_walk_tm2_maz.csv
│           ├── maz_marginals_hhgq.csv
│           ├── taz_marginals_hhgq.csv
│           └── county_marginals.csv

5.3 Key Architectural Differences

Aspect TM1 TM2
Configuration Inline in scripts Centralized tm2_config.py
Control Definition Static controls.csv Programmatic config_census.py
Pipeline Manual script execution Orchestrated tm2_pipeline.py
Census Fetching Inline CensusFetcher class Separate census_fetcher.py
Geography Hardcoded paths Configurable via config

6. PUMS Seed Data

Both models use PUMS data, but with different processing:

6.1 TM1 PUMS Processing

6.2 TM2 PUMS Processing


7. Refactoring Tradeoff Analysis

8.1 When Unification is Worth It

  1. TM1 is still actively used for projects and won’t be sunset in 2-3 years
    • Future Census updates (2028 5-year ACS) would benefit from shared infrastructure
    • Investment pays off over multiple update cycles
  2. The “shortcuts” in TM1’s current 2023 data are causing problems
    • If TM1’s approach has quality issues that need fixing anyway
    • You’d be improving data quality AND modernizing at once
  3. You want a single source of truth for Census data processing
    • ACS table definitions, CPI conversions, county codes in one place
    • When new Census data arrives, update once instead of twice
  4. Staff knows TM2 code, not TM1
    • If maintaining legacy TM1 code is becoming a knowledge gap issue
    • Unified codebase = unified team expertise

8.2 When Unification is NOT Worth It

  1. TM1 is being retired in favor of TM2 within ~2 years
    • Just maintain TM1 as-is until sunset
  2. TM1 current outputs are “good enough” and nobody is complaining
    • “If it ain’t broke, don’t fix it” has value
  3. The geographic differences make code sharing minimal
    • 1,454 TAZs vs 39,726 MAZs means most TM2 complexity (MAZ controls, hierarchical consistency) doesn’t apply to TM1
    • You’d likely maintain two control generation paths anyway

8.3 Code Sharing Assessment

Honest Assessment: TM2 is More Complex, Not Better

After detailed code review, TM2’s code is more complex, not better. Almost all the “improvements” exist because TM2 has harder problems to solve (MAZ hierarchy, multi-level controls, 2020→2010 Census geography crosswalking).

Components Analysis:

TM2 Component Worth Porting? Why/Why Not
census_fetcher.py rate limiting Maybe 100ms delay between requests. Nice but trivial to add inline if needed
Error types (CensusApiException) No Over-engineering for a script that runs once per model year
Cached file parsing (_parse_acs1_format, etc.) No Adds 150+ lines of complexity. TM1’s simple read_csv with skiprows works fine
analysis/ folder (25+ scripts) No These are TM2-specific validation scripts for MAZ-level results
Pipeline orchestration (tm2_pipeline.py) No Adds complexity TM1 doesn’t need
geog_utils.py No 2020→2010 census geography crosswalking—TM1 already has what it needs
ACS table definitions No TM1 and TM2 use different income/age bins—can’t share definitions

What TM1 would gain from porting TM2 code:

…without any improvement in output quality.

The ONLY thing worth sharing: If TM2 has fixed a specific Census API bug (e.g., handling "N/A" values or malformed responses), copy that 5-line fix. Otherwise, leave TM1 alone—it’s simpler and works.

8.4 Recommendation: Leave TM1 Alone

Don’t port TM2 code to TM1.

TM2’s additional complexity exists to solve TM2-specific problems that TM1 doesn’t have:

TM1’s approach is simpler and battle-tested. The best refactoring is no refactoring.

Exception: If you discover a specific Census API bug fix in TM2 (e.g., handling malformed responses), copy that targeted fix to TM1. These are typically 5-10 line changes, not architectural refactors.

8.5 Bottom Line

TM1 should be left alone. The geographic and control differences mean TM2’s code complexity would provide no value to TM1. TM2 is more complex because it solves more complex problems (MAZ hierarchy, multi-level controls, more granular bins). TM1 is simpler, battle-tested, and works.

Refactoring TM1 to “look like” TM2 would be adding complexity without improving output quality or maintainability.