Skip to the content.

Detailed Geographic Crosswalk Generation Guide

TM2 PopulationSim Geographic Processing and Spatial Integration

⚠️ MIGRATION NOTICE: Crosswalk generation has been moved to a standalone script.

Document Version: 2.0
Date: November 2025
Author: PopulationSim Bay Area Team


Table of Contents

  1. Migration Overview
  2. Overview
  3. Geographic Hierarchy and Data Sources
  4. Unified Crosswalk Generation Process
  5. Basic Crosswalk Creation
  6. Enhanced Crosswalk with Block Mappings
  7. Quality Assurance and Validation
  8. Output Specifications
  9. Technical Dependencies

Migration Overview

Important Change: The crosswalk generation process has been consolidated into a single standalone script that replaces the previous two-script approach:

Previous Approach (Deprecated)

New Approach (Current)

Benefits of the New Approach


Overview

The TM2 PopulationSim geographic crosswalk generation creates the foundational spatial relationships required for accurate population synthesis across multiple geographic scales. This process builds comprehensive geographic mappings from census blocks up to counties, ensuring spatial consistency and enabling proper aggregation of demographic controls.

Purpose and Scope

The crosswalk generation serves several critical functions:

Key Outputs

The unified process generates two primary deliverables:

  1. Basic Crosswalk (geo_cross_walk_tm2_maz.csv): Complete MAZ→TAZ→County→PUMA spatial mappings
  2. Enhanced Crosswalk (geo_cross_walk_tm2_block10.csv): Extended mappings including block and block group relationships

Geographic Hierarchy and Data Sources

Spatial Hierarchy

The TM2 crosswalk establishes relationships across six geographic levels:

Block (15-digit GEOID)
    ↓
Block Group (12-digit GEOID) 
    ↓
Census Tract (11-digit GEOID)
    ↓
MAZ (~39,586 zones)
    ↓
TAZ (~4,734 zones)
    ↓
County (9 Bay Area counties)
    ↓
PUMA (~104 zones)

Primary Data Sources

1. MAZ/TAZ Spatial Data

2. PUMA Spatial Data

3. County Spatial Data

4. Census Block Data

Geographic Standards

Bay Area County System

The system uses a standardized 1-9 county numbering system:

County ID County Name FIPS Code
1 San Francisco 075
2 San Mateo 081
3 Santa Clara 085
4 Alameda 001
5 Contra Costa 013
6 Solano 095
7 Napa 055
8 Sonoma 097
9 Marin 041

GEOID Structure

Census Geographic Identifiers follow standard 15-digit format:


Two-Phase Crosswalk Generation

The crosswalk generation employs a two-phase approach to handle different spatial processing requirements and ensure comprehensive geographic coverage.

Process Architecture

Phase 1: Spatial Geographic Processing
├── MAZ/TAZ Shapefile Loading
├── PUMA Spatial Assignment (Area-Based)
├── County Spatial Assignment (Centroid-Based)  
└── Primary Crosswalk Generation

Phase 2: Block Group Integration
├── Census Block Data Integration
├── Geographic Hierarchy Construction
├── Block Group Mapping Creation
└── Enhanced Crosswalk Generation

Data Flow Architecture

Input Shapefiles → Spatial Processing → Primary Crosswalk
                                      ↓
Block Data → Geographic Enhancement → Enhanced Crosswalk

Phase 1: Spatial Geographic Processing

Implementation: create_tm2_crosswalk.py

Phase 1 creates the foundational geographic relationships through sophisticated spatial analysis operations.

Step 1: MAZ/TAZ Shapefile Processing

Data Loading and Validation

# Flexible column identification system
maz_col = identify_column(['MAZ_NODE', 'MAZ', 'MAZ_ID', 'MAZ_ID_'])
taz_col = identify_column(['TAZ_NODE', 'TAZ', 'TAZ_ID', 'TAZ1454'])

Coordinate Reference System (CRS) Management

Step 2: PUMA Assignment (Area-Based Method)

Methodology: Uses area-weighted intersection to assign TAZs to PUMAs based on maximum spatial overlap.

Processing Logic:

  1. TAZ Geometry Dissolution: Combine all MAZ polygons within each TAZ
  2. Intersection Calculation: Compute area of overlap between each TAZ and all intersecting PUMAs
  3. Dominant PUMA Assignment: Assign each TAZ to PUMA with largest intersection area
  4. Coverage Validation: Verify all TAZs receive valid PUMA assignments

Quality Measures:

Step 3: County Assignment (Centroid-Based Method)

Methodology: Uses MAZ centroid spatial join with county polygons for precise county assignment.

Processing Logic:

  1. Centroid Calculation: Compute geometric centroid for each MAZ polygon
  2. Spatial Join: Intersect MAZ centroids with county polygons
  3. County Mapping: Assign county ID using standardized 1-9 system
  4. FIPS Code Integration: Add both numerical county ID and FIPS codes

Validation Steps:

Step 4: Primary Crosswalk Assembly

Data Integration:

# Final crosswalk structure
crosswalk_columns = [
    'MAZ_NODE',      # Primary MAZ identifier
    'TAZ_NODE',      # TAZ assignment
    'COUNTY',        # County ID (1-9)
    'county_name',   # County name
    'PUMA',          # PUMA assignment
    'COUNTYFP10'     # FIPS county code
]

Output Generation:


Phase 2: Block Group Integration

Implementation: build_complete_crosswalk.py

Phase 2 extends the primary crosswalk with detailed census geography relationships required for income control processing.

Step 1: Census Block Data Integration

Geographic Hierarchy Construction:

# GEOID processing and validation
blocks_df['GEOID_block'] = blocks_df['GEOID10'].astype(str).str.zfill(15)

# Aggregate geography creation
add_aggregate_geography_columns(blocks_df)
# Creates: GEOID_block_group, GEOID_tract, GEOID_county

Data Sources:

Step 2: Block Group Mapping Creation

Dominant Assignment Algorithm: For block groups spanning multiple TAZs, assigns to TAZ containing the most census blocks:

# Block group to TAZ mapping logic
bg_taz_mapping = blocks_df.groupby(['GEOID_block group', 'TAZ_NODE']).size()
dominant_taz = bg_taz_mapping.loc[bg_taz_mapping.groupby('GEOID_block group')['block_count'].idxmax()]

Spatial Resolution:

Step 3: Enhanced Crosswalk Generation

Data Integration Process:

  1. Primary Crosswalk Loading: Load Phase 1 output
  2. Geographic Enhancement: Add block and block group mappings
  3. Column Standardization: Ensure consistent naming conventions
  4. Validation: Verify complete geographic hierarchy

Enhanced Output Structure:

enhanced_columns = [
    'MAZ_NODE', 'TAZ_NODE', 'COUNTY', 'county_name', 'PUMA',  # Phase 1
    'GEOID_block', 'GEOID_block group', 'GEOID_tract', 'GEOID_county'  # Phase 2
]

Step 4: Quality Assurance and Backup

Backup Strategy:


Quality Assurance and Validation

Spatial Validation Methods

1. Area-Based PUMA Assignment Validation

Coverage Checks:

Quality Metrics:

Target Metrics:
- Single PUMA intersection: >80% of TAZs
- Complete coverage: 100% of TAZs assigned
- Zero invalid assignments: 0 NULL PUMAs

2. County Assignment Validation

Centroid Method Verification:

Error Detection:

3. Block Group Integration Validation

Hierarchical Consistency:

Statistical Validation:

# Validation statistics
total_records = len(enhanced_crosswalk)
unique_mazs = enhanced_crosswalk['MAZ_NODE'].nunique()
unique_tazs = enhanced_crosswalk['TAZ_NODE'].nunique()
unique_block_groups = enhanced_crosswalk['GEOID_block group'].nunique()
missing_bg_mappings = enhanced_crosswalk['GEOID_block group'].isna().sum()

Error Handling and Resolution

1. Spatial Join Failures

No Intersection Cases:

2. Multi-Geography Conflicts

Block Group Spanning Multiple TAZs:

3. Data Quality Issues

Missing Geographic Data:


Output Specifications

Primary Crosswalk: geo_cross_walk_tm2_maz.csv

File Location: output_2023/populationsim_working_dir/data/geo_cross_walk_tm2_maz.csv

Schema: | Column | Type | Description | Example | |——–|——|————-|———| | MAZ_NODE | Integer | MAZ identifier | 12345 | | TAZ_NODE | Integer | TAZ identifier | 1001 | | COUNTY | Integer | County ID (1-9) | 4 | | county_name | String | County name | “Alameda” | | PUMA | Integer | PUMA identifier | 5301 | | COUNTYFP10 | String | FIPS county code | “001” |

Quality Metrics:

Enhanced Crosswalk: geo_cross_walk_tm2_block10.csv

File Location: output_2023/populationsim_working_dir/data/geo_cross_walk_tm2_block10.csv

Additional Schema: | Column | Type | Description | Example | |——–|——|————-|———| | GEOID_block | String | 15-digit block GEOID | “060014001001000” | | GEOID_block group | String | 12-digit block group GEOID | “060014001001” | | GEOID_tract | String | 11-digit tract GEOID | “06001400100” | | GEOID_county | String | 5-digit county GEOID | “06001” |

Extended Capabilities:

Validation Outputs

1. Block Group Mapping Summary: bg_taz_mapping_summary.csv

Purpose: Documents block group to TAZ assignments for validation Schema:

2. Processing Logs

Console Output: Comprehensive processing statistics

TM2 CROSSWALK CREATION COMPLETE
- Final crosswalk: 39,586 MAZ zones
- Unique TAZs: 4,734
- Unique PUMAs: 104
- Counties: 9
- Enhanced crosswalk: 39,586 records
- Block groups: 1,547 unique

Technical Dependencies

Software Requirements

Python Environment

Geospatial Libraries

Hardware Specifications

Memory Requirements

Storage Requirements

External Data Dependencies

Network Paths

Data Currency

Configuration Management

Unified Configuration System

Path Management

# Example configuration structure
SHAPEFILES = {
    'maz_shapefile': Path("C:/GitHub/tm2py-utils/.../mazs_TM2_2_5.shp"),
    'puma_shapefile': Path("C:/GitHub/tm2py-utils/.../tl_2022_06_puma20.shp"),
    'county_shapefile': Path("C:/GitHub/tm2py-utils/.../Counties.shp")
}

CROSSWALK_FILES = {
    'popsim_crosswalk': Path("output_2023/.../geo_cross_walk_tm2_maz.csv")
}

Conclusion

The TM2 geographic crosswalk generation provides the spatial foundation essential for accurate population synthesis. Through its two-phase approach combining sophisticated spatial analysis with comprehensive census geography integration, the system ensures reliable geographic relationships across all required scales.

Key Achievements:

Future Enhancements:

This documentation provides the complete technical reference for understanding, maintaining, and enhancing the TM2 geographic crosswalk generation system.