Skip to content

Detect Joint Trips

processing.joint_trips.detect_joint_trips

Joint trip detection step for identifying shared household trips.

This module implements the main pipeline step for detecting joint trips where multiple household members travel together.

detect_joint_trips

detect_joint_trips(
    linked_trips: pl.DataFrame,
    households: pl.DataFrame,
    method: str = "buffer",
    time_threshold_minutes: float = 15.0,
    space_threshold_meters: float = 100.0,
    covariance: list[float] | list[list[float]] | None = None,
    confidence_level: float = 0.9,
    log_discrepancies: bool = False,
) -> dict[str, pl.DataFrame]

Detect joint trips among household members using similarity matching.

Identifies trips where multiple household members traveled together by comparing origin-destination-time similarity using either strict buffer thresholds or Mahalanobis distance.

Parameters:

Name Type Description Default
linked_trips pl.DataFrame

Journey records with coordinates and timing. Required columns: linked_trip_id, hh_id, person_id, o/d coordinates, depart/arrive times.

required
households pl.DataFrame

Household table for pre-filtering.

required
method str

Detection method - "buffer" or "mahalanobis" (default: "buffer").

'buffer'
time_threshold_minutes float

Max time difference for buffer method (default: 15.0).

15.0
space_threshold_meters float

Max spatial distance for buffer method (default: 100.0).

100.0
covariance list[float] | list[list[float]] | None

Covariance matrix for mahalanobis method - diagonal (4 values) or full (4x4 matrix). If None, uses defaults (~84m spatial, ~4.5min temporal).

None
confidence_level float

Statistical confidence level for mahalanobis (default: 0.90). Higher = stricter. 0.90 is strict, 0.75 is moderate.

0.9
log_discrepancies bool

Whether to log trips with reported vs detected traveler mismatches (default: False).

False

Returns:

Type Description
dict[str, pl.DataFrame]

Dictionary containing: - linked_trips: Original trips with added joint_trip_id column - joint_trips: Aggregated table of shared trips with participant lists

Algorithm

Phase 1: Household Pre-filtering

  1. Filter to households with 2+ members who took trips
  2. Reduces search space to only households where joint trips are possible

Phase 2: Pairwise Distance Calculation

  1. Within each multi-person household, compute pairwise distances between all trip combinations using 4D space:
    • Origin coordinates (o_lon, o_lat)
    • Destination coordinates (d_lon, d_lat)
    • Departure time
    • Arrival time
  2. Store distances in condensed matrix format for efficiency

Phase 3: Similarity Filtering

Buffer Method (default):
  • Filter trip pairs where:
    • Spatial distance (haversine) ≤ space_threshold_meters for both origin AND destination
    • Absolute time difference ≤ time_threshold_minutes for both departure AND arrival
  • Simple, interpretable thresholds
Mahalanobis Method:
  • Calculate statistical distance using covariance matrix:
    • Accounts for correlated variations in space/time
    • Compares to chi-squared distribution at confidence_level
  • More sophisticated, calibrated to actual joint trip patterns
  • Can capture joint trips more flexibly than fixed thresholds

Phase 4: Clique Detection

  1. Build graph where nodes = trips, edges = similar trip pairs
  2. Detect maximal cliques (groups of mutually-similar trips)
  3. Handle overlapping cliques by selecting disjoint set with maximum coverage
  4. Each clique represents one joint trip event
  5. Ensures transitivity: if A travels with B, and B with C, then A,B,C form one joint trip

Phase 5: Joint Trip Aggregation

  1. Assign unique joint_trip_id to each clique
  2. Create joint_trips table with:
    • Representative location/time (mean of participants)
    • person_list: Array of participating person IDs
    • trip_list: Array of individual linked_trip_ids
    • num_participants: Count of travelers
  3. Validate against reported num_travelers field if available
Notes
  • Only compares trips within same household (joint trips across households not detected)
  • Mahalanobis method requires calibrated covariance matrix (see scripts/calibrate_joint_trip_covariance.py)
  • Clique detection prevents false positives from partial or coincidental matches
  • Handles survey reporting errors where respondents over/under-report number of travelers
  • Non-joint trips retain joint_trip_id = NULL