DaySim Formatting
processing.formatting.daysim.format_daysim
DaySim Formatting Step.
Transforms canonical survey data (persons, households, trips, tours, days) into DaySim activity-based travel demand model format, applying model-specific coding schemes and data structures. See DaySim Data Models.
This module orchestrates specialized formatting for each table type, applying DaySim-specific integer codes for categorical variables while maintaining referential integrity across tables.
Components
format_persons: Produces persons_daysim table consistent withPersonDaysimModel.format_households: Produces households_daysim table consistent withHouseholdDaysimModel.format_linked_trips: Produces trips_daysim table consistent withLinkedTripDaysimModel.format_tours: Produces tours_daysim table consistent withTourDaysimModel.format_days: Produces days_daysim table consistent withPersonDayDaysimModel.
Data Quality Filters
- Partial Tours: Optionally drop tours without return home
- Missing TAZ: Remove records without spatial assignment (required for model)
- Invalid Tours: Filter out tours failing validation rules (zero distance, negative duration, data quality flags)
Implementation Notes
- DaySim requires specific integer codes for categorical variables
- Formatting maintains referential integrity across tables
- TAZ (Traffic Analysis Zone) assignment critical for model application
- Person type classification affects downstream choice model applicability
- Mode/purpose hierarchies ensure consistent coding
- Output validates against DaySim data specifications
- Days with invalid/partial tours become "no travel" days in the model
format_daysim
format_daysim(
persons: pl.DataFrame,
households: pl.DataFrame,
unlinked_trips: pl.DataFrame,
linked_trips: pl.DataFrame,
tours: pl.DataFrame,
days: pl.DataFrame,
drop_partial_tours: bool = True,
drop_missing_taz: bool = True,
drop_invalid_tours: bool = True,
) -> dict[str, pl.DataFrame]
Format canonical survey data to DaySim model specification.
Converts canonical survey tables to DaySim activity-based travel demand model format. See module docstring for complete component descriptions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persons
|
pl.DataFrame
|
Person attributes in canonical format. Required columns: person_id, hh_id, age, employment, student, etc. |
required |
households
|
pl.DataFrame
|
Household attributes in canonical format. Required columns: hh_id, home_taz, income, etc. |
required |
unlinked_trips
|
pl.DataFrame
|
Individual trip segments with mode, purpose, and timing. |
required |
linked_trips
|
pl.DataFrame
|
Journey records with coordinates, mode, purpose, and timing. |
required |
tours
|
pl.DataFrame
|
Tour records with purpose, timing, and location fields. |
required |
days
|
pl.DataFrame
|
Person-day records for completeness calculation. |
required |
drop_partial_tours
|
bool
|
If True, remove tours not marked as complete (default: True). Tours without return home are excluded. |
True
|
drop_missing_taz
|
bool
|
If True, remove households without valid TAZ/MAZ IDs (default: True). Required for model application. |
True
|
drop_invalid_tours
|
bool
|
If True, remove tours marked as invalid (default: True). Filters out zero distance, negative duration, and data quality flagged tours. |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, pl.DataFrame]
|
Dictionary containing: - households_daysim: Formatted household data with person type composition and income categories - persons_daysim: Formatted person data with person type, day pattern, and completeness flags - days_daysim: Formatted day-level data with summaries - linked_trips_daysim: Formatted trip data with DaySim mode, path type, and driver/passenger codes - tours_daysim: Formatted tour data with DaySim purpose codes and timing |
Example
result = format_daysim(
persons=canonical_persons,
households=canonical_households,
unlinked_trips=canonical_unlinked_trips,
linked_trips=canonical_linked_trips,
tours=canonical_tours,
days=canonical_days,
drop_partial_tours=True,
drop_missing_taz=True,
drop_invalid_tours=True
)
households_daysim = result["households_daysim"]
persons_daysim = result["persons_daysim"]
processing.formatting.daysim.format_persons
Person formatting for DaySim output.
format_persons
format_persons(
persons: pl.DataFrame, days: pl.DataFrame
) -> pl.DataFrame
Format person data to DaySim specification.
Applies mapping dictionaries and derives person type (pptyp) and worker
type (pwtyp) based on age, employment, and student status.
Key Transformations:
- Person Type Classification: Full-time worker, part-time worker, university student, non-working adult, retired, child by age based on age, employment status, student status (cascading logic below)
- Day Completeness: Mark complete travel days vs partial reporting (from days table)
- Activity Patterns: Work at home frequency, school location type derived from usual work/school locations
- Usual Days: Calculate usual work/school days per week
Person type (pptyp) cascading logic:
- Age < 5: Child 0-4 (type 8)
- Age < 16: Child 5-15 (type 7)
- Full-time employed: Full-time worker (type 1)
- Age 16-17 and student: High school 16+ (type 6)
- Age 18-24 and high school: High school 16+ (type 6)
- Age >= 18 and student: University student (type 5)
- Part-time/self-employed: Part-time worker (type 2)
- Age < 65: Non-working adult (type 4)
- Age >= 65: Non-working senior (type 3)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persons
|
pl.DataFrame
|
DataFrame with canonical person fields |
required |
days
|
pl.DataFrame
|
Optional DataFrame with day completeness indicators |
required |
Returns:
| Type | Description |
|---|---|
pl.DataFrame
|
DataFrame with DaySim person fields |
processing.formatting.daysim.format_households
Household formatting for DaySim output.
format_households
format_households(
households: pl.DataFrame, persons_daysim: pl.DataFrame
) -> pl.DataFrame
Format household data to DaySim specification.
Calculates household composition from person data and applies income fallback logic.
Key Transformations:
- Household Composition*: Aggregate person types within household (full-time workers, part-time workers, retirees, non-working adults, university students, high school students, children by age)
- Income Processing: Categorize household income into DaySim bins using midpoint values, handle missing values with fallback logic from detailed to followup income
- Size and Type: Household size from person count, household type derived from composition (workers, students, children)
- Coordinates: Home location coordinates and TAZ/MAZ assignment
Household composition fields:
hhftw: Full-time workershhptw: Part-time workershhret: Retirees (non-working seniors)hhoad: Other adults (non-working < 65)hhuni: University studentshhhsc: High school students 16+hh515: Children 5-15hhcu5: Children 0-4
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
households
|
pl.DataFrame
|
DataFrame with canonical household fields |
required |
persons_daysim
|
pl.DataFrame
|
DataFrame with formatted DaySim person fields |
required |
Returns:
| Type | Description |
|---|---|
pl.DataFrame
|
DataFrame with DaySim household fields |
processing.formatting.daysim.format_trips
Trip formatting for DaySim output.
format_linked_trips
format_linked_trips(
persons: pl.DataFrame,
unlinked_trips: pl.DataFrame,
linked_trips: pl.DataFrame,
) -> pl.DataFrame
Format linked trip data to DaySim specification.
Computes DaySim mode, path type, and driver/passenger codes by aggregating mode information from unlinked trip segments, then applying DaySim-specific mappings.
This function performs mode aggregation that was previously done in the linking step. By moving it here, we preserve maximum granularity in the core linked_trips table per the pipeline design philosophy, deferring format-specific aggregations to output formatters.
Key Transformations:
- Mode Codes: Map canonical mode_type to DaySim mode codes (walk, bike, SOV, HOV2, HOV3, transit, etc.), distinguish auto modes by occupancy (drive alone vs shared ride)
- Path Type: Derive transit path type from mode hierarchy (ferry > BART > premium > LRT > bus), special handling for transit access/egress modes
- Driver/Passenger: Code driver vs passenger for auto trips, TNC occupancy (alone, 2, 3+), link to household vehicle information
- Trip Sequence: Number trips within tours and half-tours (outbound/inbound)
- Purpose Codes: Map origin and destination purpose categories to DaySim codes
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persons
|
pl.DataFrame
|
DataFrame with canonical person fields |
required |
unlinked_trips
|
pl.DataFrame
|
DataFrame with canonical unlinked trip fields |
required |
linked_trips
|
pl.DataFrame
|
DataFrame with canonical linked trip fields |
required |
Returns:
| Type | Description |
|---|---|
pl.DataFrame
|
DataFrame with DaySim trip fields formatted per DaySim spec |
processing.formatting.daysim.format_tours
Tour formatting for DaySim output.
format_tours
format_tours(
persons: pl.DataFrame,
days: pl.DataFrame,
linked_trips: pl.DataFrame,
tours: pl.DataFrame,
) -> pl.DataFrame
Format tour data to DaySim specification.
Transforms canonical tour data into DaySim tour format with proper field mappings and time conversions.
Key Transformations:
- Purpose Mapping: Map canonical tour purposes to DaySim purpose codes (work, university, school, escort, shop, personal business, social, recreation, meal, change mode)
- Timing: Convert departure and arrival times to minutes after midnight, calculate duration
- Location: Map origin/destination to TAZ/MAZ, handle missing locations with -1
- Tour Mode: Determine primary mode from trip-level modes within tour
- Parent Tours: Link subtours to parent tour IDs for tour-based modeling
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persons
|
pl.DataFrame
|
DataFrame with canonical person fields |
required |
days
|
pl.DataFrame
|
DataFrame with canonical day fields |
required |
linked_trips
|
pl.DataFrame
|
DataFrame with canonical linked trip fields |
required |
tours
|
pl.DataFrame
|
DataFrame with canonical tour fields |
required |
Returns:
| Type | Description |
|---|---|
pl.DataFrame
|
DataFrame with DaySim tour fields |
processing.formatting.daysim.format_days
Person-day formatting for DaySim output.
format_days
format_days(
persons: pl.DataFrame, days: pl.DataFrame, tours: pl.DataFrame
) -> pl.DataFrame
Format person-day data for DaySim PersonDay file.
Creates person-day records with tour counts by purpose, stop counts, begin/end at home flags, work at home duration, and location coordinates.
Key Transformations:
- Day-Level Summaries: Tour count by purpose (work, school, escort, etc.), stop counts by purpose, total travel time and distance
- Tour Categories: Classify tours as home-based, work-based, or usual work location
- Activity Patterns: Begin/end at home flags, work at home duration
- Usual Locations: Work and school coordinates for spatial modeling
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persons
|
pl.DataFrame
|
Canonical person data with person_id, hh_id, work/school coords |
required |
days
|
pl.DataFrame
|
Canonical day data with day_id, person_id, travel_dow, day_weight |
required |
tours
|
pl.DataFrame
|
Canonical tour data with tour_id, day_id, tour_purpose, tour_category |
required |
Returns:
| Type | Description |
|---|---|
pl.DataFrame
|
DataFrame with DaySim PersonDay format including:
|