TM1 vs TM2 PopulationSim Exploration Plan
Created: February 11, 2026
Purpose: Enumerate differences between TM1 and TM2 population synthesis approaches to inform potential refactoring
Executive Summary
This document compares the TM1 (Travel Model One) and TM2 (Travel Model Two) implementations of PopulationSim for the MTC Bay Area region. Both models target a 2023 base year, but have substantially different geographic structures, control hierarchies, and output requirements.
Key Differences at a Glance
| Aspect | TM1 (master branch) | TM2 (tm2 branch) |
|---|---|---|
| Smallest Geography | TAZ only | MAZ (within TAZ) |
| Geographic Zones | ~1,454 TAZs | ~39,726 MAZs in ~5,000 TAZs |
| Control Levels | COUNTY → PUMA → TAZ | COUNTY → PUMA → TAZ_NODE → MAZ_NODE |
| Income Year Dollars | 2000$ | 2010$ (with 2023$ also available) |
| County Coding | 1-9 (different order) | 1-9 (SF=1, SM=2, etc.) |
| Code Structure | Simpler, legacy scripts | Modular config/pipeline |
1. Geographic Structure
1.1 TM1 Geographies (master branch)
File: bay_area/hh_gq/data/geo_cross_walk_tm1.csv
TAZ,PUMA,COUNTY,county_name,REGION
1,7503,1,San Francisco,1
2,7503,1,San Francisco,1
...
- Columns:
TAZ,PUMA,COUNTY,county_name,REGION - Zones: ~1,454 TAZs (no MAZ subdivision)
- Hierarchy: COUNTY → PUMA → TAZ
- PopulationSim geographies setting:
geographies: [COUNTY, PUMA, TAZ]
1.2 TM2 Geographies (tm2 branch)
File: output_2023/populationsim_working_dir/data/geo_cross_walk_tm2_maz.csv
MAZ_NODE,TAZ_NODE,COUNTY,county_name,PUMA,...
10001,56,1,San Francisco,2204
10002,56,1,San Francisco,2204
...
- Columns:
MAZ_NODE,TAZ_NODE,COUNTY,county_name,PUMA, plus block/tract GEOIDs - Zones: ~39,726 MAZs across ~5,000 TAZs
- Hierarchy: COUNTY → PUMA → TAZ_NODE → MAZ_NODE
- PopulationSim geographies setting:
geographies: [COUNTY, PUMA, TAZ_NODE, MAZ_NODE]
1.3 County Coding Differences
TM1 County Codes (master branch):
| COUNTY | Name |
|---|---|
| 1 | San Francisco |
| 2 | San Mateo |
| 3 | Santa Clara |
| 4 | Alameda |
| 5 | Contra Costa |
| 6 | Solano |
| 7 | Napa |
| 8 | Sonoma |
| 9 | Marin |
TM2 County Codes (tm2 branch) - uses FIPS-based coding:
| COUNTY | GEOID_county | Name |
|---|---|---|
| 1 | 06001 | Alameda |
| 13 | 06013 | Contra Costa |
| 41 | 06041 | Marin |
| 55 | 06055 | Napa |
| 75 | 06075 | San Francisco |
| 81 | 06081 | San Mateo |
| 85 | 06085 | Santa Clara |
| 95 | 06095 | Solano |
| 97 | 06097 | Sonoma |
⚠️ Key Issue: TM1 and TM2 use completely different county numbering schemes.
2. Control Variables (PopulationSim Marginals)
2.1 TM1 Controls
File: bay_area/hh_gq/configs_TM1/controls.csv
| Control | Geography | Description |
|---|---|---|
num_hh |
TAZ | Total households (including GQ as 1-person HH) |
hh_size_1_gq |
TAZ | 1-person households (includes GQ) |
hh_size_2 |
TAZ | 2-person households |
hh_size_3 |
TAZ | 3-person households |
hh_size_4_plus |
TAZ | 4+ person households |
hh_inc_30 |
TAZ | Income ≤$30k (2000$) |
hh_inc_30_60 |
TAZ | Income $30-60k (2000$) |
hh_inc_60_100 |
TAZ | Income $60-100k (2000$) |
hh_inc_100_plus |
TAZ | Income >$100k (2000$) |
hh_wrks_0 |
TAZ | 0 workers |
hh_wrks_1 |
TAZ | 1 worker |
hh_wrks_2 |
TAZ | 2 workers |
hh_wrks_3_plus |
TAZ | 3+ workers |
pers_age_00_04 |
TAZ | Persons age 0-4 |
pers_age_05_19 |
TAZ | Persons age 5-19 |
pers_age_20_44 |
TAZ | Persons age 20-44 |
pers_age_45_64 |
TAZ | Persons age 45-64 |
pers_age_65_plus |
TAZ | Persons age 65+ |
gq_type_univ |
TAZ | University GQ persons |
gq_type_mil |
TAZ | Military GQ persons |
gq_type_othnon |
TAZ | Other non-institutional GQ persons |
Key Notes:
- All controls at TAZ level (no MAZ)
- Income bins in 2000 dollars
- Age bins: 0-4, 5-19, 20-44, 45-64, 65+
- No children in household control
- No occupation controls (commented out)
2.2 TM2 Controls (Current Implementation)
MAZ-Level Controls (maz_marginals_hhgq.csv):
| Control | Description |
|---|---|
numhh_gq |
Total households + GQ (person-as-household approach) |
total_pop |
Total population |
hh_gq_university |
University GQ (each person = 1 household) |
hh_gq_military |
Military GQ (each person = 1 household) |
hh_gq_other_nonins |
Other non-institutional GQ |
TAZ-Level Controls (taz_marginals_hhgq.csv):
| Control | Description |
|---|---|
inc_lt_20k |
Income <$20k (2010$) |
inc_20k_45k |
Income $20-45k (2010$) |
inc_45k_60k |
Income $45-60k (2010$) |
inc_60k_75k |
Income $60-75k (2010$) |
inc_75k_100k |
Income $75-100k (2010$) |
inc_100k_150k |
Income $100-150k (2010$) |
inc_150k_200k |
Income $150-200k (2010$) |
inc_200k_plus |
Income >$200k (2010$) |
hh_wrks_0 through hh_wrks_3_plus |
Workers in household |
pers_age_00_19 |
Persons age 0-19 |
pers_age_20_34 |
Persons age 20-34 |
pers_age_35_64 |
Persons age 35-64 |
pers_age_65_plus |
Persons age 65+ |
hh_kids_no |
Households without children |
hh_kids_yes |
Households with children |
hh_size_1 through hh_size_6_plus |
Household size distribution |
COUNTY-Level Controls (county_marginals.csv):
| Control | Description |
|---|---|
pers_occ_management |
Management occupations |
pers_occ_professional |
Professional occupations |
pers_occ_services |
Service occupations |
pers_occ_retail |
Retail/sales occupations |
pers_occ_manual |
Manual/production occupations |
pers_occ_military |
Military occupations |
2.3 Control Differences Summary
| Aspect | TM1 | TM2 |
|---|---|---|
| Finest Geography | TAZ | MAZ |
| Income Bins | 4 bins ($30k, $60k, $100k) in 2000$ | 8 bins (aligned to ACS B19001) in 2010$ |
| Age Bins | 0-4, 5-19, 20-44, 45-64, 65+ | 0-19, 20-34, 35-64, 65+ |
| Household Size | At TAZ | At TAZ (moved from MAZ) |
| Children | Not controlled | hh_kids_yes/no at TAZ |
| Occupation | Disabled | Active at COUNTY |
| GQ Approach | Person counts at TAZ | Person-as-household at MAZ |
3. ACS/Census Tables Used
3.1 Common Tables (Both Models)
| Table | Description | Usage |
|---|---|---|
| B01001 | Sex by Age | Age distribution controls |
| B08202 | Workers in Household | Worker controls |
| B11016 | Household Size | Household size controls |
| B19001 | Household Income | Income distribution controls |
3.2 TM2-Specific Tables
| Table | Description | Usage |
|---|---|---|
| B11005 | Children in Household | hh_kids_yes/no controls |
| C24010 | Sex by Occupation | Occupation controls at county |
| B23025 | Employment Status | Military occupation proxy |
| B25003 | Tenure (ACS 1-year) | County-level HH scaling targets |
| B01003 | Total Population (ACS 1-year) | County-level pop scaling |
| P1, H1 (Decennial 2020) | Population/Housing counts | Block-level MAZ controls |
| P5 (Decennial 2020 PL) | Group Quarters by Type | GQ controls |
3.3 Census Geographies Required
| Source | TM1 | TM2 |
|---|---|---|
| Block (2020) | Not used | MAZ controls base |
| Block Group (ACS) | Yes - aggregated to TAZ | Yes - aggregated to MAZ/TAZ |
| Tract (ACS) | Yes - aggregated to TAZ | Yes - aggregated to TAZ |
| County (ACS 1-yr) | Not clear | Scaling targets |
4. Output Files
4.1 Household Output Comparison
TM1 Households (synthetic_households_recode.csv):
| Column | Source | Description |
|---|---|---|
HHID |
unique_hh_id |
Household ID |
TAZ |
TAZ |
TAZ location |
hinccat1 |
Derived | Income category 1-4 |
HINC |
hh_income_2000 |
Income in 2000 dollars |
hworkers |
hh_workers_from_esr |
Number of workers |
VEHICL |
VEH |
Vehicles |
BLD |
BLD |
Building type |
TEN |
TEN |
Tenure |
PERSONS |
NP |
Number of persons |
HHT |
HHT |
Household type |
UNITTYPE |
TYPEHUGQ |
Unit type (HH vs GQ) |
poverty_income_* |
Derived | Poverty calculations |
pct_of_poverty |
Derived | Poverty percentage |
TM2 Households (households_2023_tm2.csv):
| Column | Source | Description |
|---|---|---|
HHID |
unique_hh_id |
Household ID |
TAZ_NODE |
TAZ_NODE |
TAZ location |
MAZ_NODE |
MAZ_NODE |
MAZ location |
MTCCountyID |
COUNTY |
County 1-9 |
HHINCADJ |
hh_income_2010 |
Income in 2010 dollars |
NWRKRS_ESR |
hh_workers_from_esr |
Number of workers |
VEH |
VEH |
Vehicles |
TEN |
TEN |
Tenure |
NP |
NP |
Number of persons |
HHT |
HHT |
Household type |
BLD |
BLD |
Building type |
TYPE |
TYPEHUGQ |
Unit type |
4.2 Person Output Comparison
TM1 Persons (synthetic_persons_recode.csv):
| Column | Source | Description |
|---|---|---|
HHID |
unique_hh_id |
Household ID |
PERID |
Index + 1 | Person ID |
AGE |
AGEP |
Age |
SEX |
SEX |
Sex |
pemploy |
employ_status |
Employment status (1-4) |
pstudent |
student_status |
Student status (1-3) |
ptype |
person_type |
Person type (1-8) |
TM2 Persons (persons_2023_tm2.csv):
| Column | Source | Description |
|---|---|---|
HHID |
unique_hh_id |
Household ID |
PERID |
unique_per_id |
Person ID |
AGEP |
AGEP |
Age |
SEX |
SEX |
Sex |
SCHL |
SCHL |
Educational attainment |
OCCP |
occupation |
Occupation code |
WKHP |
WKHP |
Hours worked per week |
WKW |
WKW |
Weeks worked per year |
EMPLOYED |
employed |
Employment flag 0/1 |
ESR |
ESR |
Employment status recode |
SCHG |
SCHG |
Grade level attending |
hhgqtype |
hhgqtype |
Group quarters type |
person_type |
person_type |
Person type |
4.3 Key Output Differences
| Aspect | TM1 | TM2 |
|---|---|---|
| Geography Columns | TAZ only | MAZ_NODE, TAZ_NODE, MAZ_SEQ, TAZ_SEQ |
| Income Dollar Year | 2000$ | 2010$ |
| Person Type Definition | Full CT-RAMP compatible (1-8) | Simplified (employment-based) |
| Occupation | Not in output | OCCP code included |
| Education | Not in output | SCHL, SCHG included |
| Work Hours/Weeks | Not in output | WKHP, WKW included |
5. Code Architecture Differences
5.1 TM1 Code Structure (master branch)
bay_area/
├── create_baseyear_controls.py # Monolithic control generation
├── create_seed_population.py # PUMS seed data prep
├── postprocess_recode.py # Output formatting
├── run_populationsim.py # Execution script
├── hh_gq/
│ ├── configs_TM1/
│ │ ├── controls.csv # Control definitions
│ │ └── settings.yaml # PopulationSim config
│ └── data/
│ ├── geo_cross_walk_tm1.csv # Geographic crosswalk
│ └── seed_households.csv # PUMS seed data
5.2 TM2 Code Structure (tm2 branch)
bay_area/
├── tm2_config.py # Unified configuration
├── tm2_pipeline.py # Full pipeline orchestration
├── create_baseyear_controls.py # Control generation (uses config)
├── create_seed_population.py # PUMS seed data prep
├── postprocess_recode.py # Output formatting
├── utils/
│ ├── config_census.py # Census table definitions, CONTROLS dict
│ ├── census_fetcher.py # Census API client
│ ├── controls.py # Control processing utilities
│ ├── geog_utils.py # Geography utilities
│ └── tm2_utils.py # Pipeline utilities
├── output_2023/
│ └── populationsim_working_dir/
│ ├── configs/
│ │ ├── controls.csv # Generated control definitions
│ │ └── settings.yaml # PopulationSim config
│ └── data/
│ ├── geo_cross_walk_tm2_maz.csv
│ ├── maz_marginals_hhgq.csv
│ ├── taz_marginals_hhgq.csv
│ └── county_marginals.csv
5.3 Key Architectural Differences
| Aspect | TM1 | TM2 |
|---|---|---|
| Configuration | Inline in scripts | Centralized tm2_config.py |
| Control Definition | Static controls.csv |
Programmatic config_census.py |
| Pipeline | Manual script execution | Orchestrated tm2_pipeline.py |
| Census Fetching | Inline CensusFetcher class |
Separate census_fetcher.py |
| Geography | Hardcoded paths | Configurable via config |
6. PUMS Seed Data
Both models use PUMS data, but with different processing:
6.1 TM1 PUMS Processing
- Uses 2019-2023 5-year PUMS (crosswalked to 2010 PUMAs)
- Income converted to 2000 dollars using
hh_income_2000field - Person types computed to match CT-RAMP person type (1-8)
- Employment/student status computed for TM1 compatibility
6.2 TM2 PUMS Processing
- Uses 2019-2023 5-year PUMS (crosswalked to 2020 PUMAs)
- Income available in 2010$ and 2023$ (
hh_income_2010,hh_income_2023) - Additional fields: occupation, education, work hours/weeks
- Group quarters handled as “person-as-household” at MAZ level
7. Refactoring Tradeoff Analysis
8.1 When Unification is Worth It
- TM1 is still actively used for projects and won’t be sunset in 2-3 years
- Future Census updates (2028 5-year ACS) would benefit from shared infrastructure
- Investment pays off over multiple update cycles
- The “shortcuts” in TM1’s current 2023 data are causing problems
- If TM1’s approach has quality issues that need fixing anyway
- You’d be improving data quality AND modernizing at once
- You want a single source of truth for Census data processing
- ACS table definitions, CPI conversions, county codes in one place
- When new Census data arrives, update once instead of twice
- Staff knows TM2 code, not TM1
- If maintaining legacy TM1 code is becoming a knowledge gap issue
- Unified codebase = unified team expertise
8.2 When Unification is NOT Worth It
- TM1 is being retired in favor of TM2 within ~2 years
- Just maintain TM1 as-is until sunset
- TM1 current outputs are “good enough” and nobody is complaining
- “If it ain’t broke, don’t fix it” has value
- The geographic differences make code sharing minimal
- 1,454 TAZs vs 39,726 MAZs means most TM2 complexity (MAZ controls, hierarchical consistency) doesn’t apply to TM1
- You’d likely maintain two control generation paths anyway
8.3 Code Sharing Assessment
Honest Assessment: TM2 is More Complex, Not Better
After detailed code review, TM2’s code is more complex, not better. Almost all the “improvements” exist because TM2 has harder problems to solve (MAZ hierarchy, multi-level controls, 2020→2010 Census geography crosswalking).
Components Analysis:
| TM2 Component | Worth Porting? | Why/Why Not |
|---|---|---|
census_fetcher.py rate limiting |
Maybe | 100ms delay between requests. Nice but trivial to add inline if needed |
Error types (CensusApiException) |
No | Over-engineering for a script that runs once per model year |
Cached file parsing (_parse_acs1_format, etc.) |
No | Adds 150+ lines of complexity. TM1’s simple read_csv with skiprows works fine |
analysis/ folder (25+ scripts) |
No | These are TM2-specific validation scripts for MAZ-level results |
Pipeline orchestration (tm2_pipeline.py) |
No | Adds complexity TM1 doesn’t need |
geog_utils.py |
No | 2020→2010 census geography crosswalking—TM1 already has what it needs |
| ACS table definitions | No | TM1 and TM2 use different income/age bins—can’t share definitions |
What TM1 would gain from porting TM2 code:
- More lines of code to maintain
- More dependencies
- More abstraction layers
…without any improvement in output quality.
The ONLY thing worth sharing: If TM2 has fixed a specific Census API bug (e.g., handling "N/A" values or malformed responses), copy that 5-line fix. Otherwise, leave TM1 alone—it’s simpler and works.
8.4 Recommendation: Leave TM1 Alone
Don’t port TM2 code to TM1.
TM2’s additional complexity exists to solve TM2-specific problems that TM1 doesn’t have:
- MAZ-level synthesis hierarchy
- Multi-level control consistency
- 2020→2010 Census geography mapping
- More granular income/age bins requiring more complex ACS processing
TM1’s approach is simpler and battle-tested. The best refactoring is no refactoring.
Exception: If you discover a specific Census API bug fix in TM2 (e.g., handling malformed responses), copy that targeted fix to TM1. These are typically 5-10 line changes, not architectural refactors.
8.5 Bottom Line
TM1 should be left alone. The geographic and control differences mean TM2’s code complexity would provide no value to TM1. TM2 is more complex because it solves more complex problems (MAZ hierarchy, multi-level controls, more granular bins). TM1 is simpler, battle-tested, and works.
Refactoring TM1 to “look like” TM2 would be adding complexity without improving output quality or maintainability.