Extract Tours
processing.tours.extraction
Tour building module for travel diary survey processing.
This module implements a hierarchical tour extraction algorithm that processes linked trip data to identify and classify tours and subtours based on spatial and temporal patterns.
Algorithm
The tour building process follows a seven-phase pipeline:
1. Location Classification
- Calculates haversine distances from trip endpoints to known locations (home, work, school) using person-specific coordinates
- Classifies each trip origin/destination as HOME, WORK, SCHOOL, or OTHER based on configurable distance thresholds
- Only matches work/school locations if person has those locations defined
- Adds boolean flags: o_is_home, d_is_home, o_is_work, d_is_work, etc.
2. Home-Based Tour Identification
- Sorts trips by person, day, and departure time
- Identifies tour boundaries by detecting:
- Departures from home (o_is_home=True, d_is_home=False)
- Returns to home (o_is_home=False, d_is_home=True)
- Day boundaries (first trip of person-day)
- Assigns sequential tour IDs within each person-day
- Format: tour_id = (day_id * 100) + tour_sequence_number
3. Anchor Period Expansion (CRITICAL for subtours)
- For tours visiting usual anchor locations (work, school), expands the "at anchor" period by finding first arrival and last departure
- Uses pure Polars window functions to identify anchor periods
- Prevents subtours from being detected during travel to/from anchor
- Generalizable: supports work, school, or future anchor types
4. Anchor-Based Subtour Detection
- Within expanded anchor periods, identifies subtours by detecting:
- Departures from anchor (o_at_anchor=True, d_at_anchor=False)
- Returns to anchor (o_at_anchor=False, d_at_anchor=True)
- Assigns hierarchical subtour IDs
- Format: subtour_id = (tour_id * 10) + subtour_sequence_number
- Currently supports work-based subtours, extensible to school-based
5. Tour Attribute Aggregation
- Groups trips by tour_id (and subtour_id for subtours)
-
Computes tour-level attributes from constituent trips:
- tour_purpose: Highest priority destination purpose (person-category specific hierarchy)
- tour_mode: Highest priority travel mode (from configurable mode hierarchy)
- origin_depart_time: First trip's departure time
- dest_arrive_time: Last trip's arrival time
- trip_count: Number of trips in tour
- stop_count: Number of intermediate stops (trip_count - 1)
-
Assigns half-tour classification:
- "outbound": Trips before primary destination
- "inbound": Trips after primary destination
- "subtour": Work-based subtour trips
6. Joint Tour Identification
- If joint_trips data provided, identifies tours where all trips involve same group of travelers
- Assigns joint_tour_id to tours with stable participant groups
- Links tour-level joint travel to trip-level joint travel
7. Tour Validation and Correction
- Validates tour structure consistency
- Corrects data quality issues (e.g., inconsistent timing, missing values)
- Adds tour_id and joint_tour_id to unlinked_trips for reference
Edge Case Handling is performed including
- Incomplete tours (no return home at end of day)
- Multi-day tours (spanning survey boundaries)
- Missing work/school locations (null coordinates)
- Non-sequential trip chains (spatial gaps)
- Hierarchical tour structure: Home-based tours → Work-based subtours
- Location classification robust to GPS/geododing errors via distance thresholds
- Tour purpose reflects primary activity, not intermediate stops
- Extensible design allows future additions (school-based subtours, other anchor types)
extract_tours
extract_tours(
persons: pl.DataFrame,
households: pl.DataFrame,
unlinked_trips: pl.DataFrame,
linked_trips: pl.DataFrame,
joint_trips: pl.DataFrame | None = None,
**kwargs: dict[str, Any]
) -> dict[str, pl.DataFrame]
Extract hierarchical tour structures from linked trip data.
Builds tour and subtour structures from linked trip sequences using spatial and temporal patterns. See module docstring for complete algorithm description.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
persons
|
pl.DataFrame
|
Person attributes including work/school locations. Used to identify anchor locations for tour/subtour detection. |
required |
households
|
pl.DataFrame
|
Household attributes including home locations. Home location is primary anchor for tour identification. |
required |
unlinked_trips
|
pl.DataFrame
|
Individual trip segments. Will receive tour_id assignment. |
required |
linked_trips
|
pl.DataFrame
|
Journey records with coordinates and timing. Required columns: person_id, day_id, o_lon, o_lat, d_lon, d_lat, depart_time, arrive_time. |
required |
joint_trips
|
pl.DataFrame | None
|
Optional joint trip aggregations. If provided, enables joint tour identification based on stable participant groups. |
None
|
**kwargs
|
dict[str, Any]
|
Configuration parameters for TourConfig:
|
{}
|
Returns:
| Type | Description |
|---|---|
dict[str, pl.DataFrame]
|
Dictionary containing:
|