Skip to content

DaySim Formatting

processing.formatting.daysim.format_daysim

DaySim Formatting Step.

Transforms canonical survey data (persons, households, trips, tours, days) into DaySim activity-based travel demand model format, applying model-specific coding schemes and data structures. See DaySim Data Models.

This module orchestrates specialized formatting for each table type, applying DaySim-specific integer codes for categorical variables while maintaining referential integrity across tables.

Components

Data Quality Filters

  • Partial Tours: Optionally drop tours without return home
  • Missing TAZ: Remove records without spatial assignment (required for model)
  • Invalid Tours: Filter out tours failing validation rules (zero distance, negative duration, data quality flags)

Implementation Notes

  • DaySim requires specific integer codes for categorical variables
  • Formatting maintains referential integrity across tables
  • TAZ (Traffic Analysis Zone) assignment critical for model application
  • Person type classification affects downstream choice model applicability
  • Mode/purpose hierarchies ensure consistent coding
  • Output validates against DaySim data specifications
  • Days with invalid/partial tours become "no travel" days in the model

format_daysim

format_daysim(
    persons: pl.DataFrame,
    households: pl.DataFrame,
    unlinked_trips: pl.DataFrame,
    linked_trips: pl.DataFrame,
    tours: pl.DataFrame,
    days: pl.DataFrame,
    drop_partial_tours: bool = True,
    drop_missing_taz: bool = True,
    drop_invalid_tours: bool = True,
) -> dict[str, pl.DataFrame]

Format canonical survey data to DaySim model specification.

Converts canonical survey tables to DaySim activity-based travel demand model format. See module docstring for complete component descriptions.

Parameters:

Name Type Description Default
persons pl.DataFrame

Person attributes in canonical format. Required columns: person_id, hh_id, age, employment, student, etc.

required
households pl.DataFrame

Household attributes in canonical format. Required columns: hh_id, home_taz, income, etc.

required
unlinked_trips pl.DataFrame

Individual trip segments with mode, purpose, and timing.

required
linked_trips pl.DataFrame

Journey records with coordinates, mode, purpose, and timing.

required
tours pl.DataFrame

Tour records with purpose, timing, and location fields.

required
days pl.DataFrame

Person-day records for completeness calculation.

required
drop_partial_tours bool

If True, remove tours not marked as complete (default: True). Tours without return home are excluded.

True
drop_missing_taz bool

If True, remove households without valid TAZ/MAZ IDs (default: True). Required for model application.

True
drop_invalid_tours bool

If True, remove tours marked as invalid (default: True). Filters out zero distance, negative duration, and data quality flagged tours.

True

Returns:

Type Description
dict[str, pl.DataFrame]

Dictionary containing: - households_daysim: Formatted household data with person type composition and income categories - persons_daysim: Formatted person data with person type, day pattern, and completeness flags - days_daysim: Formatted day-level data with summaries - linked_trips_daysim: Formatted trip data with DaySim mode, path type, and driver/passenger codes - tours_daysim: Formatted tour data with DaySim purpose codes and timing

Example

result = format_daysim(
    persons=canonical_persons,
    households=canonical_households,
    unlinked_trips=canonical_unlinked_trips,
    linked_trips=canonical_linked_trips,
    tours=canonical_tours,
    days=canonical_days,
    drop_partial_tours=True,
    drop_missing_taz=True,
    drop_invalid_tours=True
)
households_daysim = result["households_daysim"]
persons_daysim = result["persons_daysim"]

processing.formatting.daysim.format_persons

Person formatting for DaySim output.

format_persons

format_persons(
    persons: pl.DataFrame, days: pl.DataFrame
) -> pl.DataFrame

Format person data to DaySim specification.

Applies mapping dictionaries and derives person type (pptyp) and worker type (pwtyp) based on age, employment, and student status.

Key Transformations:

  • Person Type Classification: Full-time worker, part-time worker, university student, non-working adult, retired, child by age based on age, employment status, student status (cascading logic below)
  • Day Completeness: Mark complete travel days vs partial reporting (from days table)
  • Activity Patterns: Work at home frequency, school location type derived from usual work/school locations
  • Usual Days: Calculate usual work/school days per week

Person type (pptyp) cascading logic:

  • Age < 5: Child 0-4 (type 8)
  • Age < 16: Child 5-15 (type 7)
  • Full-time employed: Full-time worker (type 1)
  • Age 16-17 and student: High school 16+ (type 6)
  • Age 18-24 and high school: High school 16+ (type 6)
  • Age >= 18 and student: University student (type 5)
  • Part-time/self-employed: Part-time worker (type 2)
  • Age < 65: Non-working adult (type 4)
  • Age >= 65: Non-working senior (type 3)

Parameters:

Name Type Description Default
persons pl.DataFrame

DataFrame with canonical person fields

required
days pl.DataFrame

Optional DataFrame with day completeness indicators

required

Returns:

Type Description
pl.DataFrame

DataFrame with DaySim person fields

processing.formatting.daysim.format_households

Household formatting for DaySim output.

format_households

format_households(
    households: pl.DataFrame, persons_daysim: pl.DataFrame
) -> pl.DataFrame

Format household data to DaySim specification.

Calculates household composition from person data and applies income fallback logic.

Key Transformations:

  • Household Composition*: Aggregate person types within household (full-time workers, part-time workers, retirees, non-working adults, university students, high school students, children by age)
  • Income Processing: Categorize household income into DaySim bins using midpoint values, handle missing values with fallback logic from detailed to followup income
  • Size and Type: Household size from person count, household type derived from composition (workers, students, children)
  • Coordinates: Home location coordinates and TAZ/MAZ assignment

Household composition fields:

  • hhftw: Full-time workers
  • hhptw: Part-time workers
  • hhret: Retirees (non-working seniors)
  • hhoad: Other adults (non-working < 65)
  • hhuni: University students
  • hhhsc: High school students 16+
  • hh515: Children 5-15
  • hhcu5: Children 0-4

Parameters:

Name Type Description Default
households pl.DataFrame

DataFrame with canonical household fields

required
persons_daysim pl.DataFrame

DataFrame with formatted DaySim person fields

required

Returns:

Type Description
pl.DataFrame

DataFrame with DaySim household fields

processing.formatting.daysim.format_trips

Trip formatting for DaySim output.

format_linked_trips

format_linked_trips(
    persons: pl.DataFrame,
    unlinked_trips: pl.DataFrame,
    linked_trips: pl.DataFrame,
) -> pl.DataFrame

Format linked trip data to DaySim specification.

Computes DaySim mode, path type, and driver/passenger codes by aggregating mode information from unlinked trip segments, then applying DaySim-specific mappings.

This function performs mode aggregation that was previously done in the linking step. By moving it here, we preserve maximum granularity in the core linked_trips table per the pipeline design philosophy, deferring format-specific aggregations to output formatters.

Key Transformations:

  • Mode Codes: Map canonical mode_type to DaySim mode codes (walk, bike, SOV, HOV2, HOV3, transit, etc.), distinguish auto modes by occupancy (drive alone vs shared ride)
  • Path Type: Derive transit path type from mode hierarchy (ferry > BART > premium > LRT > bus), special handling for transit access/egress modes
  • Driver/Passenger: Code driver vs passenger for auto trips, TNC occupancy (alone, 2, 3+), link to household vehicle information
  • Trip Sequence: Number trips within tours and half-tours (outbound/inbound)
  • Purpose Codes: Map origin and destination purpose categories to DaySim codes

Parameters:

Name Type Description Default
persons pl.DataFrame

DataFrame with canonical person fields

required
unlinked_trips pl.DataFrame

DataFrame with canonical unlinked trip fields

required
linked_trips pl.DataFrame

DataFrame with canonical linked trip fields

required

Returns:

Type Description
pl.DataFrame

DataFrame with DaySim trip fields formatted per DaySim spec

processing.formatting.daysim.format_tours

Tour formatting for DaySim output.

format_tours

format_tours(
    persons: pl.DataFrame,
    days: pl.DataFrame,
    linked_trips: pl.DataFrame,
    tours: pl.DataFrame,
) -> pl.DataFrame

Format tour data to DaySim specification.

Transforms canonical tour data into DaySim tour format with proper field mappings and time conversions.

Key Transformations:

  • Purpose Mapping: Map canonical tour purposes to DaySim purpose codes (work, university, school, escort, shop, personal business, social, recreation, meal, change mode)
  • Timing: Convert departure and arrival times to minutes after midnight, calculate duration
  • Location: Map origin/destination to TAZ/MAZ, handle missing locations with -1
  • Tour Mode: Determine primary mode from trip-level modes within tour
  • Parent Tours: Link subtours to parent tour IDs for tour-based modeling

Parameters:

Name Type Description Default
persons pl.DataFrame

DataFrame with canonical person fields

required
days pl.DataFrame

DataFrame with canonical day fields

required
linked_trips pl.DataFrame

DataFrame with canonical linked trip fields

required
tours pl.DataFrame

DataFrame with canonical tour fields

required

Returns:

Type Description
pl.DataFrame

DataFrame with DaySim tour fields

processing.formatting.daysim.format_days

Person-day formatting for DaySim output.

format_days

format_days(
    persons: pl.DataFrame, days: pl.DataFrame, tours: pl.DataFrame
) -> pl.DataFrame

Format person-day data for DaySim PersonDay file.

Creates person-day records with tour counts by purpose, stop counts, begin/end at home flags, work at home duration, and location coordinates.

Key Transformations:

  • Day-Level Summaries: Tour count by purpose (work, school, escort, etc.), stop counts by purpose, total travel time and distance
  • Tour Categories: Classify tours as home-based, work-based, or usual work location
  • Activity Patterns: Begin/end at home flags, work at home duration
  • Usual Locations: Work and school coordinates for spatial modeling

Parameters:

Name Type Description Default
persons pl.DataFrame

Canonical person data with person_id, hh_id, work/school coords

required
days pl.DataFrame

Canonical day data with day_id, person_id, travel_dow, day_weight

required
tours pl.DataFrame

Canonical tour data with tour_id, day_id, tour_purpose, tour_category

required

Returns:

Type Description
pl.DataFrame

DataFrame with DaySim PersonDay format including:

  • hhno, pno, day: Identifiers
  • beghom, endhom: Begin/end at home flags
  • hbtours, wbtours, uwtours: Total/work-based/usual work tours
  • wktours, sctours, estours, pbtours, shtours, mltours, sotours, retours, metours: Tour counts by purpose
  • wkstops, scstops, esstops, pbstops, shstops, mlstops, sostops, restops, mestops: Stop counts by purpose
  • wkathome: Minutes worked at home
  • pwxcord, pwycord, psxcord, psycord: Work/school coordinates
  • pdexpfac: Person-day expansion factor