How to Run the TM2 PopulationSim Pipeline
Complete instructions for executing the population synthesis pipeline.
Prerequisites
Before running the pipeline:
- ✓ Environment setup completed
- ✓ Conda environment activated (
conda activate popsim) - ✓ Geographic crosswalk files generated (see below)
Quick Start
1. Activate Environment
# Activate conda environment
conda activate popsim
# Verify Python version (should be 3.8.20)
python --version
# Navigate to project directory
cd C:/GitHub/populationsim/bay_area
# Verify environment
python setup_environment.py
2. Generate Geographic Crosswalk (Required)
IMPORTANT: The pipeline requires pre-generated crosswalk files. These must be created from the separate tm2py-utils repository:
# Navigate to tm2py-utils repository
cd C:/GitHub/tm2py-utils/tm2py_utils/inputs/maz_taz
# Generate crosswalk files
python standalone_tm2_crosswalk_creator.py
# This creates:
# - geo_cross_walk_tm2_maz.csv
# - geo_cross_walk_tm2_block10.csv
These files will be output to your PopulationSim data/ directory. See Geographic Crosswalk Guide for more details.
3. Run Full Pipeline
# Run complete pipeline (recommended)
python tm2_pipeline.py full --force
# Estimated runtime: 2-4 hours
# - Seed population: ~30 min
# - Control generation: ~15 min
# - Population synthesis: ~2-3 hours
# - Post-processing: ~10 min
Running Individual Steps
You can run pipeline steps individually for debugging or iterative development:
# Step 1: Create seed population from PUMS data
python tm2_pipeline.py seed --force
# Step 2: Generate control totals
python tm2_pipeline.py controls --force
# Step 3: Run population synthesis (longest step)
python tm2_pipeline.py populationsim --force
# Step 4: Post-process and format outputs
python tm2_pipeline.py postprocess --force
# Run specific analysis after synthesis
python tm2_pipeline.py analyze --force
Step Status Checking
# Check which steps have been completed
python tm2_pipeline.py status
# Output shows:
# ✓ seed_population - COMPLETE
# ✓ control_generation - COMPLETE
# ○ population_synthesis - NOT STARTED
# ○ postprocessing - NOT STARTED
Command-Line Options
Full Pipeline
python tm2_pipeline.py full [--force] [--verbose]
Options:
--force- Rerun steps even if already completed--verbose- Show detailed logging output
Individual Steps
python tm2_pipeline.py <step> [--force] [--verbose]
Available steps:
seed- Create seed populationcontrols- Generate control totalspopulationsim- Run synthesispostprocess- Format outputsanalyze- Run analysis scriptsfull- Run all steps
Configuration
Path Configuration
All file paths are centralized in unified_tm2_config.py. Key configurations:
External Data Paths
# Network data locations (edit if different)
self.EXTERNAL_PATHS = {
'network_census_cache': Path("M:/Data/Census/..."),
'network_census_api': Path("M:/Data/Census/API/..."),
'pums_current': Path("M:/Data/Census/PUMS_2023_5Year_Crosswalked"),
}
Note: Pipeline automatically falls back to local data if network paths are unavailable.
Census API Key
Required for downloading control data. Place your API key in:
- Network:
M:/Data/Census/API/new_key/api-key.txt - Local:
bay_area/data/api-key.txt
Get a free API key: https://api.census.gov/data/key_signup.html
GIS Reference Files
Required zone definition files:
blocks_mazs_tazs.csv- MAZ-TAZ mappingsmazs_tazs_all_geog.csv- Complete geography definitions
These should be available from network or local GIS directories.
Expected Outputs
After successful pipeline execution:
Output Directory Structure
bay_area/
├── output_2023/
│ ├── populationsim_working_dir/
│ │ ├── output/
│ │ │ ├── synthetic_households.csv
│ │ │ ├── synthetic_persons.csv
│ │ │ └── summary_*_taz.csv
│ │ └── diagnostics/
│ │ └── [validation plots and reports]
│ └── tm2_outputs/
│ ├── households_taz_*.csv
│ ├── persons_taz_*.csv
│ └── [formatted export files]
└── logs/
└── [pipeline execution logs]
Key Output Files
| File | Description |
|---|---|
synthetic_households.csv |
Final synthetic household records |
synthetic_persons.csv |
Final synthetic person records |
summary_*_taz.csv |
TAZ-level control vs. result summaries |
households_taz_*.csv |
TAZ-aggregated household data |
persons_taz_*.csv |
TAZ-aggregated person data |
See Outputs Documentation for complete field definitions.
Troubleshooting
“Crosswalk files not found”
Problem: Missing geo_cross_walk_tm2_maz.csv or geo_cross_walk_tm2_block10.csv
Solution: Generate crosswalk files first (see step 2 above)
“No module named ‘dask’”
Problem: Missing required dependency
Solution:
conda activate popsim
conda install -c conda-forge dask
“FileNotFoundError: Census API key”
Problem: Missing Census API key file
Solution:
- Get API key from Census website
- Save to
bay_area/data/api-key.txt
Pipeline hangs during synthesis
Problem: PopulationSim IPF not converging
Solution:
- Check log files in
logs/directory - Review control totals for inconsistencies
- See Population Synthesis Guide for convergence tips
Memory errors
Problem: System runs out of memory during synthesis
Solution:
- Close other applications
- Ensure at least 16GB RAM available
- Consider processing fewer geographies at once
Next Steps
After successful pipeline execution:
- Validate Results: Review Output Summaries
- Understand the Process: Read Process Overview
- Explore Components: Check individual Guides
Related Documentation
- Environment Setup - Initial setup
- Process Overview - Pipeline architecture
- Geographic Crosswalk - Crosswalk details
- Population Synthesis - Synthesis algorithm
- Outputs - Understanding results
| ← Back to Getting Started | Home |