Skip to the content.

How to Run the TM2 PopulationSim Pipeline

Complete instructions for executing the population synthesis pipeline.

Prerequisites

Before running the pipeline:

  1. Environment setup completed
  2. ✓ Conda environment activated (conda activate popsim)
  3. ✓ Geographic crosswalk files generated (see below)

Quick Start

1. Activate Environment

# Activate conda environment
conda activate popsim

# Verify Python version (should be 3.8.20)
python --version

# Navigate to project directory
cd C:/GitHub/populationsim/bay_area

# Verify environment
python setup_environment.py

2. Generate Geographic Crosswalk (Required)

IMPORTANT: The pipeline requires pre-generated crosswalk files. These must be created from the separate tm2py-utils repository:

# Navigate to tm2py-utils repository
cd C:/GitHub/tm2py-utils/tm2py_utils/inputs/maz_taz

# Generate crosswalk files
python standalone_tm2_crosswalk_creator.py

# This creates:
# - geo_cross_walk_tm2_maz.csv
# - geo_cross_walk_tm2_block10.csv

These files will be output to your PopulationSim data/ directory. See Geographic Crosswalk Guide for more details.

3. Run Full Pipeline

# Run complete pipeline (recommended)
python tm2_pipeline.py full --force

# Estimated runtime: 2-4 hours
# - Seed population: ~30 min
# - Control generation: ~15 min
# - Population synthesis: ~2-3 hours
# - Post-processing: ~10 min

Running Individual Steps

You can run pipeline steps individually for debugging or iterative development:

# Step 1: Create seed population from PUMS data
python tm2_pipeline.py seed --force

# Step 2: Generate control totals
python tm2_pipeline.py controls --force

# Step 3: Run population synthesis (longest step)
python tm2_pipeline.py populationsim --force

# Step 4: Post-process and format outputs
python tm2_pipeline.py postprocess --force

# Run specific analysis after synthesis
python tm2_pipeline.py analyze --force

Step Status Checking

# Check which steps have been completed
python tm2_pipeline.py status

# Output shows:
# ✓ seed_population - COMPLETE
# ✓ control_generation - COMPLETE  
# ○ population_synthesis - NOT STARTED
# ○ postprocessing - NOT STARTED

Command-Line Options

Full Pipeline

python tm2_pipeline.py full [--force] [--verbose]

Options:

Individual Steps

python tm2_pipeline.py <step> [--force] [--verbose]

Available steps:


Configuration

Path Configuration

All file paths are centralized in unified_tm2_config.py. Key configurations:

External Data Paths

# Network data locations (edit if different)
self.EXTERNAL_PATHS = {
    'network_census_cache': Path("M:/Data/Census/..."),
    'network_census_api': Path("M:/Data/Census/API/..."),
    'pums_current': Path("M:/Data/Census/PUMS_2023_5Year_Crosswalked"),
}

Note: Pipeline automatically falls back to local data if network paths are unavailable.

Census API Key

Required for downloading control data. Place your API key in:

Get a free API key: https://api.census.gov/data/key_signup.html

GIS Reference Files

Required zone definition files:

These should be available from network or local GIS directories.


Expected Outputs

After successful pipeline execution:

Output Directory Structure

bay_area/
├── output_2023/
│   ├── populationsim_working_dir/
│   │   ├── output/
│   │   │   ├── synthetic_households.csv
│   │   │   ├── synthetic_persons.csv
│   │   │   └── summary_*_taz.csv
│   │   └── diagnostics/
│   │       └── [validation plots and reports]
│   └── tm2_outputs/
│       ├── households_taz_*.csv
│       ├── persons_taz_*.csv
│       └── [formatted export files]
└── logs/
    └── [pipeline execution logs]

Key Output Files

File Description
synthetic_households.csv Final synthetic household records
synthetic_persons.csv Final synthetic person records
summary_*_taz.csv TAZ-level control vs. result summaries
households_taz_*.csv TAZ-aggregated household data
persons_taz_*.csv TAZ-aggregated person data

See Outputs Documentation for complete field definitions.


Troubleshooting

“Crosswalk files not found”

Problem: Missing geo_cross_walk_tm2_maz.csv or geo_cross_walk_tm2_block10.csv

Solution: Generate crosswalk files first (see step 2 above)

“No module named ‘dask’”

Problem: Missing required dependency

Solution:

conda activate popsim
conda install -c conda-forge dask

“FileNotFoundError: Census API key”

Problem: Missing Census API key file

Solution:

  1. Get API key from Census website
  2. Save to bay_area/data/api-key.txt

Pipeline hangs during synthesis

Problem: PopulationSim IPF not converging

Solution:

Memory errors

Problem: System runs out of memory during synthesis

Solution:


Next Steps

After successful pipeline execution:

  1. Validate Results: Review Output Summaries
  2. Understand the Process: Read Process Overview
  3. Explore Components: Check individual Guides


← Back to Getting Started Home