Skip to content

Extract Tours

processing.tours.extraction

Tour building module for travel diary survey processing.

This module implements a hierarchical tour extraction algorithm that processes linked trip data to identify and classify tours and subtours based on spatial and temporal patterns.

Algorithm

The tour building process follows a seven-phase pipeline:

1. Location Classification

  • Calculates haversine distances from trip endpoints to known locations (home, work, school) using person-specific coordinates
  • Classifies each trip origin/destination as HOME, WORK, SCHOOL, or OTHER based on configurable distance thresholds
  • Only matches work/school locations if person has those locations defined
  • Adds boolean flags: o_is_home, d_is_home, o_is_work, d_is_work, etc.

2. Home-Based Tour Identification

  • Sorts trips by person, day, and departure time
  • Identifies tour boundaries by detecting:
    • Departures from home (o_is_home=True, d_is_home=False)
    • Returns to home (o_is_home=False, d_is_home=True)
    • Day boundaries (first trip of person-day)
  • Assigns sequential tour IDs within each person-day
  • Format: tour_id = (day_id * 100) + tour_sequence_number

3. Anchor Period Expansion (CRITICAL for subtours)

  • For tours visiting usual anchor locations (work, school), expands the "at anchor" period by finding first arrival and last departure
  • Uses pure Polars window functions to identify anchor periods
  • Prevents subtours from being detected during travel to/from anchor
  • Generalizable: supports work, school, or future anchor types

4. Anchor-Based Subtour Detection

  • Within expanded anchor periods, identifies subtours by detecting:
    • Departures from anchor (o_at_anchor=True, d_at_anchor=False)
    • Returns to anchor (o_at_anchor=False, d_at_anchor=True)
  • Assigns hierarchical subtour IDs
  • Format: subtour_id = (tour_id * 10) + subtour_sequence_number
  • Currently supports work-based subtours, extensible to school-based

5. Tour Attribute Aggregation

  • Groups trips by tour_id (and subtour_id for subtours)
  • Computes tour-level attributes from constituent trips:

    • tour_purpose: Highest priority destination purpose (person-category specific hierarchy)
    • tour_mode: Highest priority travel mode (from configurable mode hierarchy)
    • origin_depart_time: First trip's departure time
    • dest_arrive_time: Last trip's arrival time
    • trip_count: Number of trips in tour
    • stop_count: Number of intermediate stops (trip_count - 1)
  • Assigns half-tour classification:

    • "outbound": Trips before primary destination
    • "inbound": Trips after primary destination
    • "subtour": Work-based subtour trips

6. Joint Tour Identification

  • If joint_trips data provided, identifies tours where all trips involve same group of travelers
  • Assigns joint_tour_id to tours with stable participant groups
  • Links tour-level joint travel to trip-level joint travel

7. Tour Validation and Correction

  • Validates tour structure consistency
  • Corrects data quality issues (e.g., inconsistent timing, missing values)
  • Adds tour_id and joint_tour_id to unlinked_trips for reference

Edge Case Handling is performed including

  • Incomplete tours (no return home at end of day)
  • Multi-day tours (spanning survey boundaries)
  • Missing work/school locations (null coordinates)
  • Non-sequential trip chains (spatial gaps)
  • Hierarchical tour structure: Home-based tours → Work-based subtours
  • Location classification robust to GPS/geododing errors via distance thresholds
  • Tour purpose reflects primary activity, not intermediate stops
  • Extensible design allows future additions (school-based subtours, other anchor types)

extract_tours

extract_tours(
    persons: pl.DataFrame,
    households: pl.DataFrame,
    unlinked_trips: pl.DataFrame,
    linked_trips: pl.DataFrame,
    joint_trips: pl.DataFrame | None = None,
    **kwargs: dict[str, Any]
) -> dict[str, pl.DataFrame]

Extract hierarchical tour structures from linked trip data.

Builds tour and subtour structures from linked trip sequences using spatial and temporal patterns. See module docstring for complete algorithm description.

Parameters:

Name Type Description Default
persons pl.DataFrame

Person attributes including work/school locations. Used to identify anchor locations for tour/subtour detection.

required
households pl.DataFrame

Household attributes including home locations. Home location is primary anchor for tour identification.

required
unlinked_trips pl.DataFrame

Individual trip segments. Will receive tour_id assignment.

required
linked_trips pl.DataFrame

Journey records with coordinates and timing. Required columns: person_id, day_id, o_lon, o_lat, d_lon, d_lat, depart_time, arrive_time.

required
joint_trips pl.DataFrame | None

Optional joint trip aggregations. If provided, enables joint tour identification based on stable participant groups.

None
**kwargs dict[str, Any]

Configuration parameters for TourConfig:

  • distance_thresholds: Dict of location type → distance threshold (meters). Default: {"home": 100, "work": 200, "school": 200}
  • mode_hierarchy: Mode priority for tour mode assignment (list). Higher index = higher priority.
  • purpose_hierarchy: Purpose priority by person type (dict). Maps person categories to ordered purpose lists.
  • person_category_expression: Polars expression to classify person categories (e.g., worker, student).
{}

Returns:

Type Description
dict[str, pl.DataFrame]

Dictionary containing:

  • unlinked_trips: Original unlinked trips with tour_id, joint_tour_id
  • linked_trips: Trips with tour_id, subtour_id, half_tour, joint_tour_id
  • tours: Aggregated tour records with purpose, mode, timing, trip counts, and joint_tour_id