Detect Joint Trips
processing.joint_trips.detect_joint_trips
Joint trip detection step for identifying shared household trips.
This module implements the main pipeline step for detecting joint trips where multiple household members travel together.
detect_joint_trips
detect_joint_trips(
linked_trips: pl.DataFrame,
households: pl.DataFrame,
method: str = "buffer",
time_threshold_minutes: float = 15.0,
space_threshold_meters: float = 100.0,
covariance: list[float] | list[list[float]] | None = None,
confidence_level: float = 0.9,
log_discrepancies: bool = False,
) -> dict[str, pl.DataFrame]
Detect joint trips among household members using similarity matching.
Identifies trips where multiple household members traveled together by comparing origin-destination-time similarity using either strict buffer thresholds or Mahalanobis distance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
linked_trips
|
pl.DataFrame
|
Journey records with coordinates and timing. Required columns: linked_trip_id, hh_id, person_id, o/d coordinates, depart/arrive times. |
required |
households
|
pl.DataFrame
|
Household table for pre-filtering. |
required |
method
|
str
|
Detection method - "buffer" or "mahalanobis" (default: "buffer"). |
'buffer'
|
time_threshold_minutes
|
float
|
Max time difference for buffer method (default: 15.0). |
15.0
|
space_threshold_meters
|
float
|
Max spatial distance for buffer method (default: 100.0). |
100.0
|
covariance
|
list[float] | list[list[float]] | None
|
Covariance matrix for mahalanobis method - diagonal (4 values) or full (4x4 matrix). If None, uses defaults (~84m spatial, ~4.5min temporal). |
None
|
confidence_level
|
float
|
Statistical confidence level for mahalanobis (default: 0.90). Higher = stricter. 0.90 is strict, 0.75 is moderate. |
0.9
|
log_discrepancies
|
bool
|
Whether to log trips with reported vs detected traveler mismatches (default: False). |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, pl.DataFrame]
|
Dictionary containing: - linked_trips: Original trips with added joint_trip_id column - joint_trips: Aggregated table of shared trips with participant lists |
Algorithm
Phase 1: Household Pre-filtering
- Filter to households with 2+ members who took trips
- Reduces search space to only households where joint trips are possible
Phase 2: Pairwise Distance Calculation
- Within each multi-person household, compute pairwise distances between
all trip combinations using 4D space:
- Origin coordinates (o_lon, o_lat)
- Destination coordinates (d_lon, d_lat)
- Departure time
- Arrival time
- Store distances in condensed matrix format for efficiency
Phase 3: Similarity Filtering
Buffer Method (default):
- Filter trip pairs where:
- Spatial distance (haversine) ≤ space_threshold_meters for both origin AND destination
- Absolute time difference ≤ time_threshold_minutes for both departure AND arrival
- Simple, interpretable thresholds
Mahalanobis Method:
- Calculate statistical distance using covariance matrix:
- Accounts for correlated variations in space/time
- Compares to chi-squared distribution at confidence_level
- More sophisticated, calibrated to actual joint trip patterns
- Can capture joint trips more flexibly than fixed thresholds
Phase 4: Clique Detection
- Build graph where nodes = trips, edges = similar trip pairs
- Detect maximal cliques (groups of mutually-similar trips)
- Handle overlapping cliques by selecting disjoint set with maximum coverage
- Each clique represents one joint trip event
- Ensures transitivity: if A travels with B, and B with C, then A,B,C form one joint trip
Phase 5: Joint Trip Aggregation
- Assign unique joint_trip_id to each clique
- Create joint_trips table with:
- Representative location/time (mean of participants)
- person_list: Array of participating person IDs
- trip_list: Array of individual linked_trip_ids
- num_participants: Count of travelers
- Validate against reported num_travelers field if available
Notes
- Only compares trips within same household (joint trips across households not detected)
- Mahalanobis method requires calibrated covariance matrix (see scripts/calibrate_joint_trip_covariance.py)
- Clique detection prevents false positives from partial or coincidental matches
- Handles survey reporting errors where respondents over/under-report number of travelers
- Non-joint trips retain joint_trip_id = NULL