Skip to the content.

Comprehensive Output Data Guide

TM2 PopulationSim Deliverables and Analysis Products

Document Version: 1.0
Date: December 2024
Author: PopulationSim Bay Area Team


Table of Contents

  1. Overview
  2. Primary Synthesis Outputs
  3. Validation and Quality Assurance Reports
  4. Analysis and Visualization Products
  5. Tableau-Ready Data Products
  6. Technical and Processing Outputs
  7. Output Usage Guidelines
  8. Data Quality Standards

Overview

The TM2 PopulationSim system generates a comprehensive suite of outputs designed to support transportation modeling, policy analysis, and demographic research. This guide documents all deliverables, their specifications, and appropriate use cases for each output product.

Output Categories

The system produces eight primary categories of outputs:

  1. Primary Synthesis Outputs: TM2-formatted synthetic population data
  2. Validation Reports: Quality assurance and performance metrics
  3. Analysis Products: Demographic summaries and comparative analysis
  4. Visualization Products: Charts, maps, and interactive dashboards
  5. Tableau Data: Business intelligence-ready datasets
  6. Technical Outputs: Processing logs and diagnostic information
  7. Raw Synthesis Results: Unprocessed PopulationSim engine outputs
  8. Reference Documentation: Metadata and usage guides

Output Architecture

PopulationSim Engine → Raw Outputs → Post-Processing → Final TM2 Outputs
                    ↓              ↓                 ↓
                Validation → Analysis → Visualization → Tableau Products

Critical Success Metrics

Data Volume: ~1.4M households, ~3.2M persons for complete Bay Area
Geographic Coverage: 39,586 MAZ zones, 4,734 TAZ zones, 9 counties
Quality Standards: >95% control matching accuracy, <5% demographic deviation
Format Compliance: TM2 schema requirements, ActivitySim compatibility


Primary Synthesis Outputs

TM2 Synthetic Households

File: households_2023_tm2.csv
Location: output_2023/populationsim_working_dir/output/
File Size: ~207 MB
Record Count: ~1,400,000 households
Update Frequency: Generated with each synthesis run

Required Fields Specification

Contains household-level characteristics including location, income, size, and structure.

Column Type Description Range/Values Used By Data Source/Notes
HHID Integer Unique household ID 1 - 1,400,000 HouseholdDataManager Primary key, must be unique across all households
TAZ_NODE Integer TAZ of residence 1 - 1,454 HouseholdDataManager Traffic Analysis Zone (1454 zones in Bay Area)
MAZ_NODE Integer MAZ of residence 1 - 39,586 HouseholdDataManager Micro Analysis Zone (more detailed than TAZ)
MAZ_SEQ Integer Sequential MAZ ID 1 - 39,586 HouseholdDataManager Index mapping
TAZ_SEQ Integer Sequential TAZ ID 1 - 4,734 HouseholdDataManager Index mapping
TAZ_ORIGINAL Integer Pre-remapping TAZ Historical HouseholdDataManager Version control
MAZ_ORIGINAL Integer Pre-remapping MAZ Historical HouseholdDataManager Version control
MTCCountyID Integer County of residence 1 - 9 HouseholdDataManager MTC county identifier code
HHINCADJ Integer Household income in 2010 dollars 0 - 700,000 HouseholdDataManager Inflation-adjusted income for consistent modeling
NWRKRS_ESR Integer Number of workers 0 - 20 HouseholdDataManager Count of employed persons, ranges 0-20
VEH Integer Number of vehicles owned 0 - 6, -9 HouseholdDataManager From PUMS: 0-6 vehicles, -9 for group quarters
NP Integer Number of persons in household 1 - 20 HouseholdDataManager From PUMS: ranges 1-20 persons
HHT Integer Household type 1 - 7, -9 HouseholdDataManager PUMS household type classification
BLD Integer Units in structure 1 - 10, -9 HouseholdDataManager PUMS building type classification
TYPE Integer Type of unit 1 - 3 HouseholdDataManager Housing unit vs. group quarters

Household Type Codes (HHT)

Based on PUMS classification:

Code Description
1 Married-couple family household
2 Other family household, Male householder, no wife present
3 Other family household, Female householder, no husband present
4 Nonfamily household, Male householder, Living alone
5 Nonfamily household, Male householder, Not living alone
6 Nonfamily household, Female householder, Living alone
7 Nonfamily household, Female householder, Not living alone
-9 N/A recoded for group quarters

Building Type Codes (BLD)

Based on PUMS units in structure classification:

Code Description
1 Mobile home or trailer
2 One-family house detached
3 One-family house attached
4 2 Apartments
5 3-4 Apartments
6 5-9 Apartments
7 10-19 Apartments
8 20-49 Apartments
9 50 or more apartments
10 Boat, RV, van, etc.
-9 N/A recoded for group quarters

Unit Type Codes (TYPE)

Code Description
1 Housing unit
2 Institutional group quarters (typically excluded)
3 Noninstitutional group quarters

Data Quality Specifications

Geographic Completeness: 100% of households assigned to valid TAZ/MAZ zones
Income Distribution: Matches ACS 2023 1-year estimates within ±5%
Household Size: Aligns with Census demographic patterns
Missing Values: Coded as -9 (standard TM2 convention)

Usage Applications

Transportation Modeling: Input to TM2 travel demand model
Land Use Planning: Household density and composition analysis
Economic Analysis: Income distribution and spending power assessment
Housing Policy: Tenure patterns and housing type analysis

TM2 Synthetic Persons

File: persons_2023_tm2.csv
Location: output_2023/populationsim_working_dir/output/
File Size: ~330 MB
Record Count: ~3,200,000 persons
Update Frequency: Generated with each synthesis run

Required Fields Specification

Contains individual-level characteristics for all persons within households.

Column Type Description Range/Values Used By Data Source/Notes
HHID Integer Household identifier 1 - 1,400,000 HouseholdDataManager Foreign key linking to households table
PERID String Unique person identifier “HHID_#” HouseholdDataManager Primary key, unique across all persons
AGEP Integer Age of person 0 - 99 HouseholdDataManager From PUMS: ranges 0-99 years
SEX Integer Sex of person 1, 2 HouseholdDataManager From PUMS: 1=Male, 2=Female
SCHL Integer Educational attainment -9, 1 - 16 HouseholdDataManager PUMS education level codes
OCCP Integer Occupation category -999, 1 - 6 HouseholdDataManager Recoded from PUMS occupation data
WKHP Integer Usual hours worked per week -9, 1 - 99 HouseholdDataManager From PUMS: -9 if N/A, otherwise 1-99
WKW Integer Weeks worked during past 12 months -9, 1 - 6 HouseholdDataManager PUMS work weeks classification
EMPLOYED Integer Employment status 0, 1 HouseholdDataManager Derived from ESR: 1=Employed, 0=Unemployed
ESR Integer Employment status recode 0 - 6 HouseholdDataManager PUMS employment status
SCHG Integer Grade level attending -9, 1 - 7 HouseholdDataManager PUMS school attendance level
hhgqtype Integer Household/GQ type 1, 2 HouseholdDataManager 1=HH, 2=GQ
person_type Integer Employment-based type 1 - 8 HouseholdDataManager Travel modeling

Educational Attainment Codes (SCHL)

Based on PUMS education classification:

Code Description
-9 N/A recoded for less than 3 years old
1 No schooling completed
2 Nursery school to grade 4
3 Grade 5 or grade 6
4 Grade 7 or grade 8
5 Grade 9
6 Grade 10
7 Grade 11
8 12th grade, no diploma
9 High school graduate
10 Some college, but less than 1 year
11 One or more years of college, no degree
12 Associate’s degree
13 Bachelor’s degree
14 Master’s degree
15 Professional school degree
16 Doctorate degree

Occupation Codes (OCCP)

Recoded from PUMS SOCP codes in create_seed_population.py:

Code Description
-999 N/A (under 16 or not in labor force >5 years)
1 Management
2 Professional
3 Services
4 Retail
5 Manual
6 Military

Employment Status Codes (ESR)

Based on PUMS employment status recode:

Code Description
0 N/A recoded for persons less than 16 years old
1 Civilian employed, at work
2 Civilian employed, with a job but not at work
3 Unemployed
4 Armed forces, at work
5 Armed forces, with a job but not at work
6 Not in labor force

Work Weeks Classification (WKW)

Code Description
-9 N/A (under 16 or didn’t work past 12 months)
1 50 to 52 weeks
2 48 to 49 weeks
3 40 to 47 weeks
4 27 to 39 weeks
5 14 to 26 weeks
6 13 weeks or less

School Grade Level (SCHG)

Current school attendance level:

Code Description
-9 N/A (not attending school)
1 Nursery school/preschool
2 Kindergarten
3 Grade 1 to grade 4
4 Grade 5 to grade 8
5 Grade 9 to grade 12
6 College undergraduate
7 Graduate or professional school

Data Quality Specifications

Age Distribution: Matches ACS demographic pyramids within ±5%
Employment Rates: Aligns with Bureau of Labor Statistics data
Educational Attainment: Consistent with Census educational distributions
Household Linkage: 100% of persons linked to valid households

Usage Applications

Travel Demand Modeling: Person-level trip generation and mode choice
Labor Market Analysis: Employment patterns and workforce characteristics
Education Planning: School enrollment and educational service needs
Demographic Research: Population structure and social characteristics


Validation and Quality Assurance Reports

Control vs. Result Summary

File: summary_melt.csv
Location: output_2023/populationsim_working_dir/output/
Purpose: Comprehensive comparison of synthesis results against demographic controls

Schema Specification

Column Type Description Example
geography String Geographic level “TAZ_NODE”, “COUNTY”
id Integer Geographic identifier 1001, 4
variable String Control category “hh_size_2”, “inc_lt_20k”
control Float Target control total 150.0
result Float Synthesis result 148.5
diff Float Absolute difference -1.5
pct_diff Float Percentage difference -1.0%

Quality Metrics

Control Categories: 32 demographic control variables
Geographic Levels: TAZ, MAZ, County (varying by control type)
Accuracy Standards: 95% of controls within ±5% tolerance
Coverage: All Bay Area geographic zones included

Geographic Summary Reports

TAZ-Level Summary

File: final_summary_TAZ_NODE.csv
Record Count: ~4,734 TAZ zones
Purpose: TAZ-level control matching and convergence statistics

Key Metrics:

County-Level Summaries

Files: final_summary_COUNTY_[1-9].csv
Record Count: 9 county files
Purpose: County-specific validation and occupation category analysis

Content:

Processing Performance Reports

Memory and Timing Logs

Files:

Performance Metrics:


Analysis and Visualization Products

Comprehensive Analysis Reports

Full Dataset Analysis

File: FULL_DATASET_ANALYSIS.md
Location: output_2023/
Purpose: Complete statistical analysis of synthetic population

Content:

One-Page Executive Summary

File: ONE_PAGE_SUMMARY.md
Location: output_2023/
Purpose: Executive briefing on synthesis results and key findings

Highlights:

Interactive Analysis Products

Synthetic Population Dashboard

File: synthetic_population_analysis_2023.html
Location: output_2023/
Type: Interactive HTML dashboard with embedded visualizations

Features:

Chart and Visualization Outputs

County Analysis Charts

Location: output_2023/charts/county_analysis/
Format: PNG and SVG chart files

Chart Types:

TAZ Analysis Charts

Location: output_2023/charts/taz_analysis/
Format: PNG and SVG chart files

Chart Types:


Tableau-Ready Data Products

Geographic Boundaries

TAZ Boundaries

Files: taz_boundaries_tableau.* (shapefile set)
Location: output_2023/tableau/
Purpose: Transportation Analysis Zone boundaries for mapping

Specifications:

PUMA Boundaries

Files: puma_boundaries_tableau.* (shapefile set)
Location: output_2023/tableau/
Purpose: Public Use Microdata Area boundaries for regional analysis

Specifications:

Control and Results Data

TAZ Controls and Results

File: taz_controls_results_tableau.csv
Location: output_2023/tableau/
Purpose: TAZ-level demographic data formatted for business intelligence

Schema:

Geographic Crosswalk

File: geo_crosswalk_tableau.csv
Location: output_2023/tableau/
Purpose: Geographic relationship mapping for cross-level analysis

Relationships:

Tableau Integration Guide

File: README_Tableau_Data.md
Location: output_2023/tableau/
Purpose: Complete integration instructions for Tableau users

Content:


Technical and Processing Outputs

PopulationSim Engine Outputs

Raw Synthesis Results

Files:

Purpose: Intermediate results before TM2 formatting and recoding
Usage: Debugging, algorithm validation, research applications

Processing Configuration

Files:

Intermediate Processing Files

Seed Population Files

Location: output_2023/populationsim_working_dir/data/

Files:

Purpose: Input data for synthesis engine, processed from raw PUMS

Control Data Files

Files:

Purpose: Demographic targets for synthesis optimization

Geographic Files

Files:

Diagnostic and Debug Outputs

Processing Logs

Files:

Content:

Pipeline State

File: pipeline.h5
Purpose: HDF5 format checkpoint data for pipeline state preservation
Usage: Resume interrupted processing, algorithm debugging


Output Usage Guidelines

Primary Use Cases by Audience

Transportation Planners

Primary Files:

Applications:

Policy Analysts

Primary Files:

Applications:

Researchers and Academics

Primary Files:

Applications:

Data Analysts and Visualization Specialists

Primary Files:

Applications:

Data Integration Workflows

TM2 Travel Model Integration

Input Preparation:

  1. Use households_2023_tm2.csv and persons_2023_tm2.csv as primary inputs
  2. Validate schema compliance with TM2 requirements
  3. Perform data quality checks using validation reports
  4. Configure ActivitySim with PopulationSim outputs

Quality Assurance:

Business Intelligence Integration

Tableau Workflow:

  1. Import geographic boundaries: *_boundaries_tableau.*
  2. Connect data sources: *_tableau.csv files
  3. Establish joins using standardized ID fields
  4. Apply recommended visualization templates
  5. Implement performance optimization guidelines

Power BI/Other BI Tools:


Data Quality Standards

Validation Criteria

Statistical Accuracy

Control Matching Standards:

Geographic Consistency:

Data Integrity Checks

Referential Integrity:

Logical Consistency:

Quality Assurance Process

Automated Validation

Real-Time Checks:

# Example validation checks applied to all outputs
def validate_output_quality(households_df, persons_df):
    checks = {
        'household_person_linkage': validate_hh_person_links(households_df, persons_df),
        'geographic_completeness': check_geographic_coverage(households_df),
        'demographic_plausibility': assess_demographic_patterns(households_df, persons_df),
        'control_matching_accuracy': compare_to_controls(households_df, persons_df),
        'schema_compliance': verify_tm2_schema(households_df, persons_df)
    }
    return all(checks.values())

Manual Review Process

Quality Gates:

  1. Statistical Review: Demographic patterns assessment by subject matter experts
  2. Geographic Review: Spatial distribution analysis and validation
  3. Trend Analysis: Comparison with historical synthesis results
  4. External Validation: Cross-reference with alternative data sources

Documentation Requirements:

Error Handling and Recovery

Data Quality Issues

Common Issues:

Recovery Procedures:


Conclusion

The TM2 PopulationSim output suite provides comprehensive support for transportation modeling, policy analysis, and demographic research through a carefully designed hierarchy of data products. Each output serves specific use cases while maintaining consistency and quality across the entire product ecosystem.

Key Strengths:

Comprehensive Coverage: From raw technical outputs to polished business intelligence products

Quality Assurance: Multi-layered validation ensures reliability and accuracy

User-Focused Design: Outputs tailored to specific user communities and applications

Integration-Ready: Seamless integration with downstream modeling and analysis systems

This comprehensive output framework ensures that PopulationSim synthesis results effectively support the full spectrum of Bay Area planning, policy, and research applications while maintaining the highest standards of data quality and analytical rigor.