Data Models
Pydantic data models provide validation and type checking for survey data processing.
Overview
Data models represent individual records (rows) and define:
- Required and optional fields
- Field validation rules and constraints
- Foreign key relationships between tables
- Pipeline step requirements
Models use Pydantic's BaseModel with custom field validators to ensure data quality throughout the processing pipeline.
Key Features
Field Validation
Each field includes validation rules:
age: AgeCategory = step_field(required_in_steps=["extract_tours"])
home_lat: float = step_field(ge=-90, le=90, required_in_steps=["extract_tours"])
Foreign Key Relationships
Models enforce referential integrity:
hh_id: int = step_field(
ge=1,
fk_to="households.hh_id",
required_child=True,
)
Pipeline Step Requirements
Fields specify which processing steps require them:
person_num: int = step_field(ge=1, required_in_steps=["format_ctramp", "format_daysim"])
Usage Example
from data_canon.models.survey import PersonModel
person = PersonModel(
person_id=1,
hh_id=100,
person_num=1,
age=AgeCategory.AGE_35_64,
gender=Gender.FEMALE,
employment=Employment.FULL_TIME,
student=Student.NOT_STUDENT,
# ... other fields
)
Survey Data Models
Core data models used in the processing pipeline for households, persons, days, trips, and tours.
data_canon.models.survey.HouseholdModel
pydantic-model
Household attributes (minimal for tour building).
Fields:
-
hh_id(int) -
home_lat(float) -
home_lon(float) -
residence_rent_own(ResidenceRentOwn) -
residence_type(ResidenceType) -
income(int | None) -
income_bin(IncomeBroad) -
hh_weight(float | None) -
num_vehicles(int) -
complete(bool)
hh_id
pydantic-field
hh_id: int
home_lat
pydantic-field
home_lat: float
home_lon
pydantic-field
home_lon: float
income
pydantic-field
income: int | None = None
hh_weight
pydantic-field
hh_weight: float | None
num_vehicles
pydantic-field
num_vehicles: int
complete
pydantic-field
complete: bool
data_canon.models.survey.PersonModel
pydantic-model
Person attributes for tour building.
Fields:
-
person_id(int) -
hh_id(int) -
person_num(int) -
age(AgeCategory) -
gender(Gender) -
work_lat(float | None) -
work_lon(float | None) -
school_lat(float | None) -
school_lon(float | None) -
job_type(JobType | None) -
employment(Employment) -
student(Student) -
school_type(SchoolType | None) -
work_park(WorkParking | None) -
work_mode(Mode | None) -
race(Race | None) -
ethnicity(Ethnicity | None) -
telework_freq(CommuteFreq | None) -
commute_freq(CommuteFreq | None) -
commute_subsidy_use_3(BooleanYesNo | None) -
commute_subsidy_use_4(BooleanYesNo | None) -
is_proxy(bool | None) -
num_days_complete(int) -
complete(bool | None) -
person_weight(float | None)
person_id
pydantic-field
person_id: int
hh_id
pydantic-field
hh_id: int
person_num
pydantic-field
person_num: int
work_lat
pydantic-field
work_lat: float | None
work_lon
pydantic-field
work_lon: float | None
school_lat
pydantic-field
school_lat: float | None
school_lon
pydantic-field
school_lon: float | None
commute_subsidy_use_3
pydantic-field
commute_subsidy_use_3: BooleanYesNo | None = None
commute_subsidy_use_4
pydantic-field
commute_subsidy_use_4: BooleanYesNo | None = None
is_proxy
pydantic-field
is_proxy: bool | None = None
num_days_complete
pydantic-field
num_days_complete: int = 0
complete
pydantic-field
complete: bool | None = None
person_weight
pydantic-field
person_weight: float | None = None
data_canon.models.survey.PersonDayModel
pydantic-model
Daily activity pattern summary with clear purpose-specific counts.
Fields:
-
person_id(int) -
day_id(int) -
hh_id(int) -
travel_date(datetime) -
travel_dow(TravelDow) -
complete(bool | None) -
day_weight(float | None)
person_id
pydantic-field
person_id: int
day_id
pydantic-field
day_id: int
hh_id
pydantic-field
hh_id: int
travel_date
pydantic-field
travel_date: datetime
complete
pydantic-field
complete: bool | None = False
day_weight
pydantic-field
day_weight: float | None = None
data_canon.models.survey.UnlinkedTripModel
pydantic-model
Trip data model for validation.
Fields:
-
unlinked_trip_id(int) -
day_id(int) -
person_id(int) -
hh_id(int) -
linked_trip_id(int) -
tour_id(int | None) -
o_lon(float) -
o_lat(float) -
d_lon(float) -
d_lat(float) -
o_purpose(Purpose) -
d_purpose(Purpose) -
o_purpose_category(PurposeCategory) -
d_purpose_category(PurposeCategory) -
mode_type(ModeType) -
mode_1(Mode | None) -
mode_2(Mode | None) -
mode_3(Mode | None) -
mode_4(Mode | None) -
duration_minutes(float) -
distance_meters(float) -
depart_time(datetime | None) -
arrive_time(datetime | None) -
num_travelers(int) -
complete(bool | None) -
unlinked_trip_weight(float | None)
Validators:
unlinked_trip_id
pydantic-field
unlinked_trip_id: int
day_id
pydantic-field
day_id: int
person_id
pydantic-field
person_id: int
hh_id
pydantic-field
hh_id: int
linked_trip_id
pydantic-field
linked_trip_id: int
tour_id
pydantic-field
tour_id: int | None
o_lon
pydantic-field
o_lon: float
o_lat
pydantic-field
o_lat: float
d_lon
pydantic-field
d_lon: float
d_lat
pydantic-field
d_lat: float
duration_minutes
pydantic-field
duration_minutes: float
distance_meters
pydantic-field
distance_meters: float
depart_time
pydantic-field
depart_time: datetime | None
arrive_time
pydantic-field
arrive_time: datetime | None
num_travelers
pydantic-field
num_travelers: int
complete
pydantic-field
complete: bool | None = None
unlinked_trip_weight
pydantic-field
unlinked_trip_weight: float | None = None
validate_arrival_after_departure
pydantic-validator
validate_arrival_after_departure() -> UnlinkedTripModel
Ensure arrive_time is after depart_time.
Raises:
| Type | Description |
|---|---|
ValueError
|
If arrival time is before or equal to departure time |
data_canon.models.survey.LinkedTripModel
pydantic-model
Linked Trip data model for validation.
Fields:
-
day_id(int) -
person_id(int) -
hh_id(int) -
linked_trip_id(int) -
joint_trip_id(int | None) -
tour_id(int) -
travel_dow(TravelDow) -
o_purpose(Purpose) -
o_purpose_category(PurposeCategory) -
o_lat(float) -
o_lon(float) -
d_purpose(Purpose) -
d_purpose_category(PurposeCategory) -
d_lat(float) -
d_lon(float) -
mode_type(ModeType) -
driver(Driver) -
num_travelers(int) -
access_mode(AccessEgressMode | None) -
egress_mode(AccessEgressMode | None) -
duration_minutes(float) -
distance_meters(float) -
depart_time(datetime) -
arrive_time(datetime) -
tour_direction(TourDirection) -
complete(bool | None) -
linked_trip_weight(float | None)
day_id
pydantic-field
day_id: int
person_id
pydantic-field
person_id: int
hh_id
pydantic-field
hh_id: int
linked_trip_id
pydantic-field
linked_trip_id: int
joint_trip_id
pydantic-field
joint_trip_id: int | None = None
tour_id
pydantic-field
tour_id: int
o_lat
pydantic-field
o_lat: float
o_lon
pydantic-field
o_lon: float
d_lat
pydantic-field
d_lat: float
d_lon
pydantic-field
d_lon: float
num_travelers
pydantic-field
num_travelers: int
duration_minutes
pydantic-field
duration_minutes: float
distance_meters
pydantic-field
distance_meters: float
depart_time
pydantic-field
depart_time: datetime
arrive_time
pydantic-field
arrive_time: datetime
complete
pydantic-field
complete: bool | None = None
linked_trip_weight
pydantic-field
linked_trip_weight: float | None = None
data_canon.models.survey.TourModel
pydantic-model
Tour-level records with clear, descriptive step_field names.
Fields:
-
tour_id(int) -
hh_id(int) -
person_id(int) -
day_id(int) -
tour_num(int) -
subtour_num(int) -
parent_tour_id(int) -
joint_tour_id(int | None) -
tour_purpose(PurposeCategory | None) -
tour_category(TourCategory) -
single_trip_tour(bool) -
origin_depart_time(datetime) -
origin_arrive_time(datetime) -
dest_arrive_time(datetime | None) -
dest_depart_time(datetime | None) -
origin_linked_trip_id(int) -
dest_linked_trip_id(int | None) -
o_lat(float) -
o_lon(float) -
d_lat(float) -
d_lon(float) -
o_location_type(LocationType) -
d_location_type(LocationType) -
tour_mode(ModeType) -
outbound_mode(ModeType | None) -
inbound_mode(ModeType | None) -
num_travelers(int) -
complete(bool | None) -
tour_weight(float | None)
Validators:
tour_id
pydantic-field
tour_id: int
hh_id
pydantic-field
hh_id: int
person_id
pydantic-field
person_id: int
day_id
pydantic-field
day_id: int
tour_num
pydantic-field
tour_num: int
subtour_num
pydantic-field
subtour_num: int
parent_tour_id
pydantic-field
parent_tour_id: int
joint_tour_id
pydantic-field
joint_tour_id: int | None = None
single_trip_tour
pydantic-field
single_trip_tour: bool = False
origin_depart_time
pydantic-field
origin_depart_time: datetime
origin_arrive_time
pydantic-field
origin_arrive_time: datetime
dest_arrive_time
pydantic-field
dest_arrive_time: datetime | None = None
dest_depart_time
pydantic-field
dest_depart_time: datetime | None = None
origin_linked_trip_id
pydantic-field
origin_linked_trip_id: int
dest_linked_trip_id
pydantic-field
dest_linked_trip_id: int | None = None
o_lat
pydantic-field
o_lat: float
o_lon
pydantic-field
o_lon: float
d_lat
pydantic-field
d_lat: float
d_lon
pydantic-field
d_lon: float
o_location_type
pydantic-field
o_location_type: LocationType
d_location_type
pydantic-field
d_location_type: LocationType
num_travelers
pydantic-field
num_travelers: int = 1
complete
pydantic-field
complete: bool | None = None
tour_weight
pydantic-field
tour_weight: float | None = None
validate_complete_tours
pydantic-validator
validate_complete_tours() -> TourModel
Validate that complete tours have all required fields.
Single-trip tours (where person made one trip but didn't return home) are allowed to have null tour_purpose, destination times, and dest_linked_trip_id. Complete tours must have these fields populated.
data_canon.models.survey.JointTripModel
pydantic-model
Joint trip group containing multiple linked trips from same household.
Represents a detected shared trip where multiple household members traveled together. Each joint trip has a unique ID and aggregated spatiotemporal attributes from its member trips.
Fields:
-
joint_trip_id(int) -
hh_id(int) -
day_id(int) -
num_joint_travelers(int) -
o_lat_mean(float) -
o_lon_mean(float) -
d_lat_mean(float) -
d_lon_mean(float) -
depart_time_mean(datetime) -
depart_arrive_mean(datetime) -
complete(bool | None) -
joint_trip_weight(float | None)
joint_trip_id
pydantic-field
joint_trip_id: int
hh_id
pydantic-field
hh_id: int
day_id
pydantic-field
day_id: int
num_joint_travelers
pydantic-field
num_joint_travelers: int
Number of travelers in this joint trip
o_lat_mean
pydantic-field
o_lat_mean: float
Mean origin latitude across member trips
o_lon_mean
pydantic-field
o_lon_mean: float
Mean origin longitude across member trips
d_lat_mean
pydantic-field
d_lat_mean: float
Mean destination latitude across member trips
d_lon_mean
pydantic-field
d_lon_mean: float
Mean destination longitude across member trips
depart_time_mean
pydantic-field
depart_time_mean: datetime
Mean departure time across member trips
depart_arrive_mean
pydantic-field
depart_arrive_mean: datetime
Mean arrival time across member trips
complete
pydantic-field
complete: bool | None = None
joint_trip_weight
pydantic-field
joint_trip_weight: float | None = None
Travel Model-formatted Data Models
DaySim Models
Output file format models for the DaySim activity-based travel demand model.
CTRAMP Models
Output file format models for the CT-RAMP travel demand model.