Skip to content

Transit

Tables

Data models for various GTFS tables using pandera library.

The module includes the following classes:

  • AgenciesTable: Optional. Represents the Agency table in the GTFS dataset.
  • WranglerStopsTable: Represents the Stops table in the GTFS dataset.
  • RoutesTable: Represents the Routes table in the GTFS dataset.
  • WranglerShapesTable: Represents the Shapes table in the GTFS dataset.
  • WranglerStopTimesTable: Represents the Stop Times table in the GTFS dataset.
  • WranglerTripsTable: Represents the Trips table in the GTFS dataset.

Each table model leverages the Pydantic data models defined in the records module to define the data model for the corresponding table. The classes also include additional configurations for, such as uniqueness constraints.

Validating a table to the WranglerStopsTable

from network_wrangler.models.gtfs.tables import WranglerStopsTable
from network_wrangler.utils.modesl import validate_df_to_model

validated_stops_df = validate_df_to_model(stops_df, WranglerStopsTable)

network_wrangler.models.gtfs.tables.AgenciesTable

Bases: DataFrameModel

Represents the Agency table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#agencytxt

Attributes:

  • agency_id (str) –

    The agency_id. Primary key. Required to be unique.

  • agency_name (str) –

    The agency name.

  • agency_url (str) –

    The agency URL.

  • agency_timezone (str) –

    The agency timezone.

  • agency_lang (str) –

    The agency language.

  • agency_phone (str) –

    The agency phone number.

  • agency_fare_url (str) –

    The agency fare URL.

  • agency_email (str) –

    The agency email.

Source code in network_wrangler/models/gtfs/tables.py
class AgenciesTable(DataFrameModel):
    """Represents the Agency table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#agencytxt>

    Attributes:
        agency_id (str): The agency_id. Primary key. Required to be unique.
        agency_name (str): The agency name.
        agency_url (str): The agency URL.
        agency_timezone (str): The agency timezone.
        agency_lang (str): The agency language.
        agency_phone (str): The agency phone number.
        agency_fare_url (str): The agency fare URL.
        agency_email (str): The agency email.
    """

    agency_id: Series[str] = Field(coerce=True, nullable=False, unique=True)
    agency_name: Series[str] = Field(coerce=True, nullable=True)
    agency_url: Series[HttpURL] = Field(coerce=True, nullable=True)
    agency_timezone: Series[str] = Field(coerce=True, nullable=True)
    agency_lang: Series[str] = Field(coerce=True, nullable=True)
    agency_phone: Series[str] = Field(coerce=True, nullable=True)
    agency_fare_url: Series[str] = Field(coerce=True, nullable=True)
    agency_email: Series[str] = Field(coerce=True, nullable=True)

    class Config:
        """Config for the AgenciesTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["agency_id"]

network_wrangler.models.gtfs.tables.FrequenciesTable

Bases: DataFrameModel

Represents the Frequency table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#frequenciestxt

The primary key of this table is a composite key of trip_id and start_time.

Attributes:

  • trip_id (str) –

    Foreign key to trip_id in the trips table.

  • start_time (TimeString) –

    The start time in HH:MM:SS format.

  • end_time (TimeString) –

    The end time in HH:MM:SS format.

  • headway_secs (int) –

    The headway in seconds.

Source code in network_wrangler/models/gtfs/tables.py
class FrequenciesTable(DataFrameModel):
    """Represents the Frequency table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#frequenciestxt>

    The primary key of this table is a composite key of `trip_id` and `start_time`.

    Attributes:
        trip_id (str): Foreign key to `trip_id` in the trips table.
        start_time (TimeString): The start time in HH:MM:SS format.
        end_time (TimeString): The end time in HH:MM:SS format.
        headway_secs (int): The headway in seconds.
    """

    trip_id: Series[str] = Field(nullable=False, coerce=True)
    start_time: Series[TimeString] = Field(
        nullable=False, coerce=True, default=DEFAULT_TIMESPAN[0]
    )
    end_time: Series[TimeString] = Field(nullable=False, coerce=True, default=DEFAULT_TIMESPAN[1])
    headway_secs: Series[int] = Field(
        coerce=True,
        ge=1,
        nullable=False,
    )

    class Config:
        """Config for the FrequenciesTable data model."""

        coerce = True
        add_missing_columns = True
        unique: ClassVar[list[str]] = ["trip_id", "start_time"]
        _pk: ClassVar[TablePrimaryKeys] = ["trip_id", "start_time"]
        _fk: ClassVar[TableForeignKeys] = {"trip_id": ("trips", "trip_id")}

network_wrangler.models.gtfs.tables.RoutesTable

Bases: DataFrameModel

Represents the Routes table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#routestxt

Attributes:

  • route_id (str) –

    The route_id. Primary key. Required to be unique.

  • route_short_name (Optional[str]) –

    The route short name.

  • route_long_name (Optional[str]) –

    The route long name.

  • route_type (RouteType) –

    The route type. Required. Values can be: - 0: Tram, Streetcar, Light rail - 1: Subway, Metro - 2: Rail - 3: Bus

  • agency_id (Optional[str]) –

    The agency_id. Foreign key to agency_id in the agencies table.

  • route_desc (Optional[str]) –

    The route description.

  • route_url (Optional[str]) –

    The route URL.

  • route_color (Optional[str]) –

    The route color.

  • route_text_color (Optional[str]) –

    The route text color.

Source code in network_wrangler/models/gtfs/tables.py
class RoutesTable(DataFrameModel):
    """Represents the Routes table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#routestxt>

    Attributes:
        route_id (str): The route_id. Primary key. Required to be unique.
        route_short_name (Optional[str]): The route short name.
        route_long_name (Optional[str]): The route long name.
        route_type (RouteType): The route type. Required. Values can be:
            - 0: Tram, Streetcar, Light rail
            - 1: Subway, Metro
            - 2: Rail
            - 3: Bus
        agency_id (Optional[str]): The agency_id. Foreign key to agency_id in the agencies table.
        route_desc (Optional[str]): The route description.
        route_url (Optional[str]): The route URL.
        route_color (Optional[str]): The route color.
        route_text_color (Optional[str]): The route text color.
    """

    route_id: Series[str] = Field(nullable=False, unique=True, coerce=True)
    route_short_name: Series[str] = Field(nullable=True, coerce=True)
    route_long_name: Series[str] = Field(nullable=True, coerce=True)
    route_type: Series[Category] = Field(
        dtype_kwargs={"categories": RouteType}, coerce=True, nullable=False
    )

    # Optional Fields
    agency_id: Optional[Series[str]] = Field(nullable=True, coerce=True)
    route_desc: Optional[Series[str]] = Field(nullable=True, coerce=True)
    route_url: Optional[Series[str]] = Field(nullable=True, coerce=True)
    route_color: Optional[Series[str]] = Field(nullable=True, coerce=True)
    route_text_color: Optional[Series[str]] = Field(nullable=True, coerce=True)

    class Config:
        """Config for the RoutesTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["route_id"]
        _fk: ClassVar[TableForeignKeys] = {"agency_id": ("agencies", "agency_id")}

network_wrangler.models.gtfs.tables.ShapesTable

Bases: DataFrameModel

Represents the Shapes table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#shapestxt

Attributes:

  • shape_id (str) –

    The shape_id. Primary key. Required to be unique.

  • shape_pt_lat (float) –

    The shape point latitude.

  • shape_pt_lon (float) –

    The shape point longitude.

  • shape_pt_sequence (int) –

    The shape point sequence.

  • shape_dist_traveled (Optional[float]) –

    The shape distance traveled.

Source code in network_wrangler/models/gtfs/tables.py
class ShapesTable(DataFrameModel):
    """Represents the Shapes table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#shapestxt>

    Attributes:
        shape_id (str): The shape_id. Primary key. Required to be unique.
        shape_pt_lat (float): The shape point latitude.
        shape_pt_lon (float): The shape point longitude.
        shape_pt_sequence (int): The shape point sequence.
        shape_dist_traveled (Optional[float]): The shape distance traveled.
    """

    shape_id: Series[str] = Field(nullable=False, coerce=True)
    shape_pt_lat: Series[float] = Field(coerce=True, nullable=False, ge=-90, le=90)
    shape_pt_lon: Series[float] = Field(coerce=True, nullable=False, ge=-180, le=180)
    shape_pt_sequence: Series[int] = Field(coerce=True, nullable=False, ge=0)

    # Optional
    shape_dist_traveled: Optional[Series[float]] = Field(coerce=True, nullable=True, ge=0)

    class Config:
        """Config for the ShapesTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["shape_id", "shape_pt_sequence"]
        _fk: ClassVar[TableForeignKeys] = {}
        unique: ClassVar[list[str]] = ["shape_id", "shape_pt_sequence"]

network_wrangler.models.gtfs.tables.StopTimesTable

Bases: DataFrameModel

Represents the Stop Times table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#stop_timestxt

The primary key of this table is a composite key of trip_id and stop_sequence.

Attributes:

  • trip_id (str) –

    Foreign key to trip_id in the trips table.

  • stop_id (str) –

    Foreign key to stop_id in the stops table.

  • stop_sequence (int) –

    The stop sequence.

  • pickup_type (PickupDropoffType) –

    The pickup type. Values can be: - 0: Regularly scheduled pickup - 1: No pickup available - 2: Must phone agency to arrange pickup - 3: Must coordinate with driver to arrange pickup

  • drop_off_type (PickupDropoffType) –

    The drop off type. Values can be: - 0: Regularly scheduled drop off - 1: No drop off available - 2: Must phone agency to arrange drop off - 3: Must coordinate with driver to arrange drop off

  • arrival_time (TimeString) –

    The arrival time in HH:MM:SS format.

  • departure_time (TimeString) –

    The departure time in HH:MM:SS format.

  • shape_dist_traveled (Optional[float]) –

    The shape distance traveled.

  • timepoint (Optional[TimepointType]) –

    The timepoint type. Values can be: - 0: The stop is not a timepoint - 1: The stop is a timepoint

Source code in network_wrangler/models/gtfs/tables.py
class StopTimesTable(DataFrameModel):
    """Represents the Stop Times table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#stop_timestxt>

    The primary key of this table is a composite key of `trip_id` and `stop_sequence`.

    Attributes:
        trip_id (str): Foreign key to `trip_id` in the trips table.
        stop_id (str): Foreign key to `stop_id` in the stops table.
        stop_sequence (int): The stop sequence.
        pickup_type (PickupDropoffType): The pickup type. Values can be:
            - 0: Regularly scheduled pickup
            - 1: No pickup available
            - 2: Must phone agency to arrange pickup
            - 3: Must coordinate with driver to arrange pickup
        drop_off_type (PickupDropoffType): The drop off type. Values can be:
            - 0: Regularly scheduled drop off
            - 1: No drop off available
            - 2: Must phone agency to arrange drop off
            - 3: Must coordinate with driver to arrange drop off
        arrival_time (TimeString): The arrival time in HH:MM:SS format.
        departure_time (TimeString): The departure time in HH:MM:SS format.
        shape_dist_traveled (Optional[float]): The shape distance traveled.
        timepoint (Optional[TimepointType]): The timepoint type. Values can be:
            - 0: The stop is not a timepoint
            - 1: The stop is a timepoint
    """

    trip_id: Series[str] = Field(nullable=False, coerce=True)
    stop_id: Series[str] = Field(nullable=False, coerce=True)
    stop_sequence: Series[int] = Field(nullable=False, coerce=True, ge=0)
    pickup_type: Series[Category] = Field(
        dtype_kwargs={"categories": PickupDropoffType},
        nullable=True,
        coerce=True,
    )
    drop_off_type: Series[Category] = Field(
        dtype_kwargs={"categories": PickupDropoffType},
        nullable=True,
        coerce=True,
    )
    arrival_time: Series[pa.Timestamp] = Field(nullable=True, default=pd.NaT, coerce=True)
    departure_time: Series[pa.Timestamp] = Field(nullable=True, default=pd.NaT, coerce=True)

    # Optional
    shape_dist_traveled: Optional[Series[float]] = Field(coerce=True, nullable=True, ge=0)
    timepoint: Optional[Series[Category]] = Field(
        dtype_kwargs={"categories": TimepointType}, coerce=True, default=0
    )

    class Config:
        """Config for the StopTimesTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["trip_id", "stop_sequence"]
        _fk: ClassVar[TableForeignKeys] = {
            "trip_id": ("trips", "trip_id"),
            "stop_id": ("stops", "stop_id"),
        }

        unique: ClassVar[list[str]] = ["trip_id", "stop_sequence"]

    @pa.dataframe_parser
    def parse_times(cls, df):
        """Parse time strings to timestamps."""
        # Convert string times to timestamps
        if "arrival_time" in df.columns and "departure_time" in df.columns:
            # Convert string times to timestamps using str_to_time_series
            df["arrival_time"] = str_to_time_series(df["arrival_time"])
            df["departure_time"] = str_to_time_series(df["departure_time"])

        return df

network_wrangler.models.gtfs.tables.StopTimesTable.parse_times

parse_times(df)

Parse time strings to timestamps.

Source code in network_wrangler/models/gtfs/tables.py
@pa.dataframe_parser
def parse_times(cls, df):
    """Parse time strings to timestamps."""
    # Convert string times to timestamps
    if "arrival_time" in df.columns and "departure_time" in df.columns:
        # Convert string times to timestamps using str_to_time_series
        df["arrival_time"] = str_to_time_series(df["arrival_time"])
        df["departure_time"] = str_to_time_series(df["departure_time"])

    return df

network_wrangler.models.gtfs.tables.StopsTable

Bases: DataFrameModel

Represents the Stops table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#stopstxt

Attributes:

  • stop_id (str) –

    The stop_id. Primary key. Required to be unique.

  • stop_lat (float) –

    The stop latitude.

  • stop_lon (float) –

    The stop longitude.

  • wheelchair_boarding (Optional[int]) –

    The wheelchair boarding.

  • stop_code (Optional[str]) –

    The stop code.

  • stop_name (Optional[str]) –

    The stop name.

  • tts_stop_name (Optional[str]) –

    The text-to-speech stop name.

  • stop_desc (Optional[str]) –

    The stop description.

  • zone_id (Optional[str]) –

    The zone id.

  • stop_url (Optional[str]) –

    The stop URL.

  • location_type (Optional[LocationType]) –

    The location type. Values can be: - 0: stop platform - 1: station - 2: entrance/exit - 3: generic node - 4: boarding area Default of blank assumes a stop platform.

  • parent_station (Optional[str]) –

    The stop_id of the parent station.

  • stop_timezone (Optional[str]) –

    The stop timezone.

Source code in network_wrangler/models/gtfs/tables.py
class StopsTable(DataFrameModel):
    """Represents the Stops table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#stopstxt>

    Attributes:
        stop_id (str): The stop_id. Primary key. Required to be unique.
        stop_lat (float): The stop latitude.
        stop_lon (float): The stop longitude.
        wheelchair_boarding (Optional[int]): The wheelchair boarding.
        stop_code (Optional[str]): The stop code.
        stop_name (Optional[str]): The stop name.
        tts_stop_name (Optional[str]): The text-to-speech stop name.
        stop_desc (Optional[str]): The stop description.
        zone_id (Optional[str]): The zone id.
        stop_url (Optional[str]): The stop URL.
        location_type (Optional[LocationType]): The location type. Values can be:
            - 0: stop platform
            - 1: station
            - 2: entrance/exit
            - 3: generic node
            - 4: boarding area
            Default of blank assumes a stop platform.
        parent_station (Optional[str]): The `stop_id` of the parent station.
        stop_timezone (Optional[str]): The stop timezone.
    """

    stop_id: Series[str] = Field(coerce=True, nullable=False, unique=True)
    stop_lat: Series[float] = Field(coerce=True, nullable=False, ge=-90, le=90)
    stop_lon: Series[float] = Field(coerce=True, nullable=False, ge=-180, le=180)

    # Optional Fields
    wheelchair_boarding: Optional[Series[Category]] = Field(
        dtype_kwargs={"categories": WheelchairAccessible}, coerce=True, default=0
    )
    stop_code: Optional[Series[str]] = Field(nullable=True, coerce=True)
    stop_name: Optional[Series[str]] = Field(nullable=True, coerce=True)
    tts_stop_name: Optional[Series[str]] = Field(nullable=True, coerce=True)
    stop_desc: Optional[Series[str]] = Field(nullable=True, coerce=True)
    zone_id: Optional[Series[str]] = Field(nullable=True, coerce=True)
    stop_url: Optional[Series[str]] = Field(nullable=True, coerce=True)
    location_type: Optional[Series[Category]] = Field(
        dtype_kwargs={"categories": LocationType},
        nullable=True,
        coerce=True,
        default=0,
    )
    parent_station: Optional[Series[str]] = Field(nullable=True, coerce=True)
    stop_timezone: Optional[Series[str]] = Field(nullable=True, coerce=True)

    class Config:
        """Config for the StopsTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["stop_id"]
        _fk: ClassVar[TableForeignKeys] = {"parent_station": ("stops", "stop_id")}

network_wrangler.models.gtfs.tables.TripsTable

Bases: DataFrameModel

Represents the Trips table in the GTFS dataset.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#tripstxt

Attributes:

  • trip_id (str) –

    Primary key. Required to be unique.

  • shape_id (str) –

    Foreign key to shape_id in the shapes table.

  • direction_id (DirectionID) –

    The direction id. Required. Values can be: - 0: Outbound - 1: Inbound

  • service_id (str) –

    The service id.

  • route_id (str) –

    The route id. Foreign key to route_id in the routes table.

  • trip_short_name (Optional[str]) –

    The trip short name.

  • trip_headsign (Optional[str]) –

    The trip headsign.

  • block_id (Optional[str]) –

    The block id.

  • wheelchair_accessible (Optional[int]) –

    The wheelchair accessible. Values can be: - 0: No information - 1: Allowed - 2: Not allowed

  • bikes_allowed (Optional[int]) –

    The bikes allowed. Values can be: - 0: No information - 1: Allowed - 2: Not allowed

Source code in network_wrangler/models/gtfs/tables.py
class TripsTable(DataFrameModel):
    """Represents the Trips table in the GTFS dataset.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#tripstxt>

    Attributes:
        trip_id (str): Primary key. Required to be unique.
        shape_id (str): Foreign key to `shape_id` in the shapes table.
        direction_id (DirectionID): The direction id. Required. Values can be:
            - 0: Outbound
            - 1: Inbound
        service_id (str): The service id.
        route_id (str): The route id. Foreign key to `route_id` in the routes table.
        trip_short_name (Optional[str]): The trip short name.
        trip_headsign (Optional[str]): The trip headsign.
        block_id (Optional[str]): The block id.
        wheelchair_accessible (Optional[int]): The wheelchair accessible. Values can be:
            - 0: No information
            - 1: Allowed
            - 2: Not allowed
        bikes_allowed (Optional[int]): The bikes allowed. Values can be:
            - 0: No information
            - 1: Allowed
            - 2: Not allowed
    """

    trip_id: Series[str] = Field(nullable=False, unique=True, coerce=True)
    shape_id: Series[str] = Field(nullable=False, coerce=True)
    direction_id: Series[Category] = Field(
        dtype_kwargs={"categories": DirectionID}, coerce=True, nullable=False, default=0
    )
    service_id: Series[str] = Field(nullable=False, coerce=True, default="1")
    route_id: Series[str] = Field(nullable=False, coerce=True)

    # Optional Fields
    trip_short_name: Optional[Series[str]] = Field(nullable=True, coerce=True)
    trip_headsign: Optional[Series[str]] = Field(nullable=True, coerce=True)
    block_id: Optional[Series[str]] = Field(nullable=True, coerce=True)
    wheelchair_accessible: Optional[Series[Category]] = Field(
        dtype_kwargs={"categories": WheelchairAccessible}, coerce=True, default=0
    )
    bikes_allowed: Optional[Series[Category]] = Field(
        dtype_kwargs={"categories": BikesAllowed},
        coerce=True,
        default=0,
    )

    class Config:
        """Config for the TripsTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["trip_id"]
        _fk: ClassVar[TableForeignKeys] = {"route_id": ("routes", "route_id")}

network_wrangler.models.gtfs.tables.WranglerFrequenciesTable

Bases: FrequenciesTable

Wrangler flavor of GTFS FrequenciesTable.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#frequenciestxt

The primary key of this table is a composite key of trip_id and start_time.

Attributes:

  • trip_id (str) –

    Foreign key to trip_id in the trips table.

  • start_time (datetime) –

    The start time in datetime format.

  • end_time (datetime) –

    The end time in datetime format.

  • headway_secs (int) –

    The headway in seconds.

Source code in network_wrangler/models/gtfs/tables.py
class WranglerFrequenciesTable(FrequenciesTable):
    """Wrangler flavor of GTFS FrequenciesTable.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#frequenciestxt>

    The primary key of this table is a composite key of `trip_id` and `start_time`.

    Attributes:
        trip_id (str): Foreign key to `trip_id` in the trips table.
        start_time (datetime.datetime): The start time in datetime format.
        end_time (datetime.datetime): The end time in datetime format.
        headway_secs (int): The headway in seconds.
    """

    projects: Series[str] = Field(coerce=True, default="")
    start_time: Series = Field(
        nullable=False, coerce=True, default=str_to_time(DEFAULT_TIMESPAN[0])
    )
    end_time: Series = Field(nullable=False, coerce=True, default=str_to_time(DEFAULT_TIMESPAN[1]))

    class Config:
        """Config for the FrequenciesTable data model."""

        coerce = True
        add_missing_columns = True
        unique: ClassVar[list[str]] = ["trip_id", "start_time"]
        _pk: ClassVar[TablePrimaryKeys] = ["trip_id", "start_time"]
        _fk: ClassVar[TableForeignKeys] = {"trip_id": ("trips", "trip_id")}

    @pa.parser("start_time")
    def st_to_timestamp(cls, series: Series) -> Series[Timestamp]:
        """Check that start time is timestamp."""
        series = series.fillna(str_to_time(DEFAULT_TIMESPAN[0]))
        if series.dtype == "datetime64[ns]":
            return series
        series = str_to_time_series(series)
        return series.astype("datetime64[ns]")

    @pa.parser("end_time")
    def et_to_timestamp(cls, series: Series) -> Series[Timestamp]:
        """Check that start time is timestamp."""
        series = series.fillna(str_to_time(DEFAULT_TIMESPAN[1]))
        if series.dtype == "datetime64[ns]":
            return series
        return str_to_time_series(series)

network_wrangler.models.gtfs.tables.WranglerFrequenciesTable.et_to_timestamp

et_to_timestamp(series)

Check that start time is timestamp.

Source code in network_wrangler/models/gtfs/tables.py
@pa.parser("end_time")
def et_to_timestamp(cls, series: Series) -> Series[Timestamp]:
    """Check that start time is timestamp."""
    series = series.fillna(str_to_time(DEFAULT_TIMESPAN[1]))
    if series.dtype == "datetime64[ns]":
        return series
    return str_to_time_series(series)

network_wrangler.models.gtfs.tables.WranglerFrequenciesTable.st_to_timestamp

st_to_timestamp(series)

Check that start time is timestamp.

Source code in network_wrangler/models/gtfs/tables.py
@pa.parser("start_time")
def st_to_timestamp(cls, series: Series) -> Series[Timestamp]:
    """Check that start time is timestamp."""
    series = series.fillna(str_to_time(DEFAULT_TIMESPAN[0]))
    if series.dtype == "datetime64[ns]":
        return series
    series = str_to_time_series(series)
    return series.astype("datetime64[ns]")

network_wrangler.models.gtfs.tables.WranglerShapesTable

Bases: ShapesTable

Wrangler flavor of GTFS ShapesTable.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#shapestxt

Attributes:

  • shape_id (str) –

    The shape_id. Primary key. Required to be unique.

  • shape_pt_lat (float) –

    The shape point latitude.

  • shape_pt_lon (float) –

    The shape point longitude.

  • shape_pt_sequence (int) –

    The shape point sequence.

  • shape_dist_traveled (Optional[float]) –

    The shape distance traveled.

  • shape_model_node_id (int) –

    The model_node_id of the shape point. Foreign key to the model_node_id in the nodes table.

  • projects (str) –

    A comma-separated string value for projects that have been applied to this shape.

Source code in network_wrangler/models/gtfs/tables.py
class WranglerShapesTable(ShapesTable):
    """Wrangler flavor of GTFS ShapesTable.

     For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#shapestxt>

    Attributes:
        shape_id (str): The shape_id. Primary key. Required to be unique.
        shape_pt_lat (float): The shape point latitude.
        shape_pt_lon (float): The shape point longitude.
        shape_pt_sequence (int): The shape point sequence.
        shape_dist_traveled (Optional[float]): The shape distance traveled.
        shape_model_node_id (int): The `model_node_id` of the shape point. Foreign key to the `model_node_id` in the nodes table.
        projects (str): A comma-separated string value for projects that have been applied to this shape.
    """

    shape_model_node_id: Series[int] = Field(coerce=True, nullable=False)
    projects: Series[str] = Field(coerce=True, default="")

network_wrangler.models.gtfs.tables.WranglerStopTimesTable

Bases: StopTimesTable

Wrangler flavor of GTFS StopTimesTable.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#stop_timestxt

The primary key of this table is a composite key of trip_id and stop_sequence.

Attributes:

  • trip_id (str) –

    Foreign key to trip_id in the trips table.

  • stop_id (int) –

    Foreign key to stop_id in the stops table.

  • stop_sequence (int) –

    The stop sequence.

  • pickup_type (PickupDropoffType) –

    The pickup type. Values can be: - 0: Regularly scheduled pickup - 1: No pickup available - 2: Must phone agency to arrange pickup - 3: Must coordinate with driver to arrange pickup

  • drop_off_type (PickupDropoffType) –

    The drop off type. Values can be: - 0: Regularly scheduled drop off - 1: No drop off available - 2: Must phone agency to arrange drop off - 3: Must coordinate with driver to arrange drop off

  • shape_dist_traveled (Optional[float]) –

    The shape distance traveled.

  • timepoint (Optional[TimepointType]) –

    The timepoint type. Values can be: - 0: The stop is not a timepoint - 1: The stop is a timepoint

  • projects (str) –

    A comma-separated string value for projects that have been applied to this stop.

Source code in network_wrangler/models/gtfs/tables.py
class WranglerStopTimesTable(StopTimesTable):
    """Wrangler flavor of GTFS StopTimesTable.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#stop_timestxt>

    The primary key of this table is a composite key of `trip_id` and `stop_sequence`.

    Attributes:
        trip_id (str): Foreign key to `trip_id` in the trips table.
        stop_id (int): Foreign key to `stop_id` in the stops table.
        stop_sequence (int): The stop sequence.
        pickup_type (PickupDropoffType): The pickup type. Values can be:
            - 0: Regularly scheduled pickup
            - 1: No pickup available
            - 2: Must phone agency to arrange pickup
            - 3: Must coordinate with driver to arrange pickup
        drop_off_type (PickupDropoffType): The drop off type. Values can be:
            - 0: Regularly scheduled drop off
            - 1: No drop off available
            - 2: Must phone agency to arrange drop off
            - 3: Must coordinate with driver to arrange drop off
        shape_dist_traveled (Optional[float]): The shape distance traveled.
        timepoint (Optional[TimepointType]): The timepoint type. Values can be:
            - 0: The stop is not a timepoint
            - 1: The stop is a timepoint
        projects (str): A comma-separated string value for projects that have been applied to this stop.
    """

    stop_id: Series[int] = Field(nullable=False, coerce=True, description="The model_node_id.")
    projects: Series[str] = Field(coerce=True, default="")
    arrival_time: Series[pa.Timestamp] = Field(nullable=True, default=pd.NaT, coerce=True)
    departure_time: Series[pa.Timestamp] = Field(nullable=True, default=pd.NaT, coerce=True)

    class Config:
        """Config for the StopTimesTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["trip_id", "stop_sequence"]
        _fk: ClassVar[TableForeignKeys] = {
            "trip_id": ("trips", "trip_id"),
            "stop_id": ("stops", "stop_id"),
        }

        unique: ClassVar[list[str]] = ["trip_id", "stop_sequence"]

    @pa.dataframe_parser
    def parse_times(cls, df):
        """Parse time strings to timestamps."""
        # Convert string times to timestamps
        if "arrival_time" in df.columns and "departure_time" in df.columns:
            # Convert string times to timestamps using str_to_time_series
            df["arrival_time"] = str_to_time_series(df["arrival_time"])
            df["departure_time"] = str_to_time_series(df["departure_time"])

        return df

network_wrangler.models.gtfs.tables.WranglerStopTimesTable.parse_times

parse_times(df)

Parse time strings to timestamps.

Source code in network_wrangler/models/gtfs/tables.py
@pa.dataframe_parser
def parse_times(cls, df):
    """Parse time strings to timestamps."""
    # Convert string times to timestamps
    if "arrival_time" in df.columns and "departure_time" in df.columns:
        # Convert string times to timestamps using str_to_time_series
        df["arrival_time"] = str_to_time_series(df["arrival_time"])
        df["departure_time"] = str_to_time_series(df["departure_time"])

    return df

network_wrangler.models.gtfs.tables.WranglerStopsTable

Bases: StopsTable

Wrangler flavor of GTFS StopsTable.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#stopstxt

Attributes:

  • stop_id (int) –

    The stop_id. Primary key. Required to be unique. Wrangler assumes that this is a reference to a roadway node and as such must be an integer

  • stop_lat (float) –

    The stop latitude.

  • stop_lon (float) –

    The stop longitude.

  • wheelchair_boarding (Optional[int]) –

    The wheelchair boarding.

  • stop_code (Optional[str]) –

    The stop code.

  • stop_name (Optional[str]) –

    The stop name.

  • tts_stop_name (Optional[str]) –

    The text-to-speech stop name.

  • stop_desc (Optional[str]) –

    The stop description.

  • zone_id (Optional[str]) –

    The zone id.

  • stop_url (Optional[str]) –

    The stop URL.

  • location_type (Optional[LocationType]) –

    The location type. Values can be: - 0: stop platform - 1: station - 2: entrance/exit - 3: generic node - 4: boarding area Default of blank assumes a stop platform.

  • parent_station (Optional[int]) –

    The stop_id of the parent station. Since stop_id is an integer in Wrangler, this field is also an integer

  • stop_timezone (Optional[str]) –

    The stop timezone.

  • stop_id_GTFS (Optional[str]) –

    The stop_id from the GTFS data.

  • projects (str) –

    A comma-separated string value for projects that have been applied to this stop.

Source code in network_wrangler/models/gtfs/tables.py
class WranglerStopsTable(StopsTable):
    """Wrangler flavor of GTFS StopsTable.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#stopstxt>

    Attributes:
        stop_id (int): The stop_id. Primary key. Required to be unique. **Wrangler assumes that this is a reference to a roadway node and as such must be an integer**
        stop_lat (float): The stop latitude.
        stop_lon (float): The stop longitude.
        wheelchair_boarding (Optional[int]): The wheelchair boarding.
        stop_code (Optional[str]): The stop code.
        stop_name (Optional[str]): The stop name.
        tts_stop_name (Optional[str]): The text-to-speech stop name.
        stop_desc (Optional[str]): The stop description.
        zone_id (Optional[str]): The zone id.
        stop_url (Optional[str]): The stop URL.
        location_type (Optional[LocationType]): The location type. Values can be:
            - 0: stop platform
            - 1: station
            - 2: entrance/exit
            - 3: generic node
            - 4: boarding area
            Default of blank assumes a stop platform.
        parent_station (Optional[int]): The `stop_id` of the parent station. **Since stop_id is an integer in Wrangler, this field is also an integer**
        stop_timezone (Optional[str]): The stop timezone.
        stop_id_GTFS (Optional[str]): The stop_id from the GTFS data.
        projects (str): A comma-separated string value for projects that have been applied to this stop.
    """

    stop_id: Series[int] = Field(
        coerce=True, nullable=False, unique=True, description="The model_node_id."
    )
    stop_id_GTFS: Series[str] = Field(
        coerce=True,
        nullable=True,
        description="The stop_id from the GTFS data",
    )
    stop_lat: Series[float] = Field(coerce=True, nullable=True, ge=-90, le=90)
    stop_lon: Series[float] = Field(coerce=True, nullable=True, ge=-180, le=180)
    projects: Series[str] = Field(coerce=True, default="")

network_wrangler.models.gtfs.tables.WranglerTripsTable

Bases: TripsTable

Represents the Trips table in the Wrangler feed, adding projects list.

For field definitions, see the GTFS reference: https://gtfs.org/documentation/schedule/reference/#tripstxt

Attributes:

  • trip_id (str) –

    Primary key. Required to be unique.

  • shape_id (str) –

    Foreign key to shape_id in the shapes table.

  • direction_id (DirectionID) –

    The direction id. Required. Values can be: - 0: Outbound - 1: Inbound

  • service_id (str) –

    The service id.

  • route_id (str) –

    The route id. Foreign key to route_id in the routes table.

  • trip_short_name (Optional[str]) –

    The trip short name.

  • trip_headsign (Optional[str]) –

    The trip headsign.

  • block_id (Optional[str]) –

    The block id.

  • wheelchair_accessible (Optional[int]) –

    The wheelchair accessible. Values can be: - 0: No information - 1: Allowed - 2: Not allowed

  • bikes_allowed (Optional[int]) –

    The bikes allowed. Values can be: - 0: No information - 1: Allowed - 2: Not allowed

  • projects (str) –

    A comma-separated string value for projects that have been applied to this trip.

Source code in network_wrangler/models/gtfs/tables.py
class WranglerTripsTable(TripsTable):
    """Represents the Trips table in the Wrangler feed, adding projects list.

    For field definitions, see the GTFS reference: <https://gtfs.org/documentation/schedule/reference/#tripstxt>

    Attributes:
        trip_id (str): Primary key. Required to be unique.
        shape_id (str): Foreign key to `shape_id` in the shapes table.
        direction_id (DirectionID): The direction id. Required. Values can be:
            - 0: Outbound
            - 1: Inbound
        service_id (str): The service id.
        route_id (str): The route id. Foreign key to `route_id` in the routes table.
        trip_short_name (Optional[str]): The trip short name.
        trip_headsign (Optional[str]): The trip headsign.
        block_id (Optional[str]): The block id.
        wheelchair_accessible (Optional[int]): The wheelchair accessible. Values can be:
            - 0: No information
            - 1: Allowed
            - 2: Not allowed
        bikes_allowed (Optional[int]): The bikes allowed. Values can be:
            - 0: No information
            - 1: Allowed
            - 2: Not allowed
        projects (str): A comma-separated string value for projects that have been applied to this trip.
    """

    projects: Series[str] = Field(coerce=True, default="")

    class Config:
        """Config for the WranglerTripsTable data model."""

        coerce = True
        add_missing_columns = True
        _pk: ClassVar[TablePrimaryKeys] = ["trip_id"]
        _fk: ClassVar[TableForeignKeys] = {"route_id": ("routes", "route_id")}

Data Model for Pure GTFS Feed (not wrangler-flavored).

network_wrangler.models.gtfs.gtfs.GtfsModel

Bases: DBModelMixin

Wrapper class around GTFS feed.

This is the pure GTFS model version of Feed

Most functionality derives from mixin class DBModelMixin which provides:

  • validation of tables to schemas when setting a table attribute (e.g. self.trips = trips_df)
  • validation of fks when setting a table attribute (e.g. self.trips = trips_df)
  • hashing and deep copy functionality
  • overload of eq to apply only to tables in table_names.
  • convenience methods for accessing tables

Attributes:

  • table_names (list[str]) –

    list of table names in GTFS feed.

  • tables (list[DataFrame]) –

    list tables as dataframes.

  • stop_times (DataFrame[StopTimesTable]) –

    stop_times dataframe with roadway node_ids

  • stops (DataFrame[WranglerStopsTable]) –

    stops dataframe

  • shapes (DataFrame[ShapesTable]) –

    shapes dataframe

  • trips (DataFrame[TripsTable]) –

    trips dataframe

  • frequencies (Optional[DataFrame[FrequenciesTable]]) –

    frequencies dataframe

  • routes (DataFrame[RoutesTable]) –

    route dataframe

  • net (Optional[TransitNetwork]) –

    TransitNetwork object

Source code in network_wrangler/models/gtfs/gtfs.py
class GtfsModel(DBModelMixin):
    """Wrapper class around GTFS feed.

    This is the pure GTFS model version of [Feed][network_wrangler.transit.feed.feed.Feed]

    Most functionality derives from mixin class
    [`DBModelMixin`][network_wrangler.models._base.db.DBModelMixin] which provides:

    - validation of tables to schemas when setting a table attribute (e.g. self.trips = trips_df)
    - validation of fks when setting a table attribute (e.g. self.trips = trips_df)
    - hashing and deep copy functionality
    - overload of __eq__ to apply only to tables in table_names.
    - convenience methods for accessing tables

    Attributes:
        table_names (list[str]): list of table names in GTFS feed.
        tables (list[DataFrame]): list tables as dataframes.
        stop_times (DataFrame[StopTimesTable]): stop_times dataframe with roadway node_ids
        stops (DataFrame[WranglerStopsTable]): stops dataframe
        shapes (DataFrame[ShapesTable]): shapes dataframe
        trips (DataFrame[TripsTable]): trips dataframe
        frequencies (Optional[DataFrame[FrequenciesTable]]): frequencies dataframe
        routes (DataFrame[RoutesTable]): route dataframe
        net (Optional[TransitNetwork]): TransitNetwork object
    """

    # the ordering here matters because the stops need to be added before stop_times if
    # stop times needs to be converted
    _table_models: ClassVar[dict] = {
        "agencies": AgenciesTable,
        "frequencies": FrequenciesTable,
        "routes": RoutesTable,
        "shapes": ShapesTable,
        "stops": StopsTable,
        "trips": TripsTable,
        "stop_times": StopTimesTable,
    }

    table_names: ClassVar[list[str]] = [
        "routes",
        "shapes",
        "stops",
        "trips",
        "stop_times",
    ]

    optional_table_names: ClassVar[list[str]] = ["agencies", "frequencies"]

    def __init__(self, **kwargs):
        """Initialize GTFS model."""
        self.initialize_tables(**kwargs)

        # Set extra provided attributes.
        extra_attr = {k: v for k, v in kwargs.items() if k not in self.table_names}
        for k, v in extra_attr:
            self.__setattr__(k, v)

network_wrangler.models.gtfs.gtfs.GtfsModel.__init__

__init__(**kwargs)

Initialize GTFS model.

Source code in network_wrangler/models/gtfs/gtfs.py
def __init__(self, **kwargs):
    """Initialize GTFS model."""
    self.initialize_tables(**kwargs)

    # Set extra provided attributes.
    extra_attr = {k: v for k, v in kwargs.items() if k not in self.table_names}
    for k, v in extra_attr:
        self.__setattr__(k, v)

network_wrangler.models.gtfs.gtfs.GtfsValidationError

Bases: Exception

Exception raised for errors in the GTFS feed.

Source code in network_wrangler/models/gtfs/gtfs.py
class GtfsValidationError(Exception):
    """Exception raised for errors in the GTFS feed."""

Feed

Main functionality for GTFS tables including Feed object.

network_wrangler.transit.feed.feed.Feed

Bases: DBModelMixin

Wrapper class around Wrangler flavored GTFS feed.

Most functionality derives from mixin class DBModelMixin which provides:

  • validation of tables to schemas when setting a table attribute (e.g. self.trips = trips_df)
  • validation of fks when setting a table attribute (e.g. self.trips = trips_df)
  • hashing and deep copy functionality
  • overload of eq to apply only to tables in table_names.
  • convenience methods for accessing tables

What is Wrangler-flavored GTFS?

A Wrangler-flavored GTFS feed differs from a GTFS feed in the following ways:

  • frequencies.txt is required
  • shapes.txt requires additional field, shape_model_node_id, corresponding to model_node_id in the RoadwayNetwork
  • stops.txt - stop_id is required to be an int

Attributes:

  • table_names (list[str]) –

    list of table names in GTFS feed.

  • tables (list[DataFrame]) –

    list tables as dataframes.

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    stop_times dataframe with roadway node_ids

  • stops (DataFrame[WranglerStopsTable]) –

    stops dataframe

  • shapes (DataFrame[WranglerShapesTable]) –

    shapes dataframe

  • trips (DataFrame[WranglerTripsTable]) –

    trips dataframe

  • frequencies (DataFrame[WranglerFrequenciesTable]) –

    frequencies dataframe

  • routes (DataFrame[RoutesTable]) –

    route dataframe

  • agencies (Optional[DataFrame[AgenciesTable]]) –

    agencies dataframe

  • net (Optional[TransitNetwork]) –

    TransitNetwork object

Source code in network_wrangler/transit/feed/feed.py
class Feed(DBModelMixin):
    """Wrapper class around Wrangler flavored GTFS feed.

    Most functionality derives from mixin class
    [`DBModelMixin`][network_wrangler.models._base.db.DBModelMixin] which provides:

    - validation of tables to schemas when setting a table attribute (e.g. self.trips = trips_df)
    - validation of fks when setting a table attribute (e.g. self.trips = trips_df)
    - hashing and deep copy functionality
    - overload of __eq__ to apply only to tables in table_names.
    - convenience methods for accessing tables

    !!! note "What is Wrangler-flavored GTFS?"

        A Wrangler-flavored GTFS feed differs from a GTFS feed in the following ways:

        * `frequencies.txt` is required
        * `shapes.txt` requires additional field, `shape_model_node_id`, corresponding to `model_node_id` in the `RoadwayNetwork`
        * `stops.txt` - `stop_id` is required to be an int

    Attributes:
        table_names (list[str]): list of table names in GTFS feed.
        tables (list[DataFrame]): list tables as dataframes.
        stop_times (DataFrame[WranglerStopTimesTable]): stop_times dataframe with roadway node_ids
        stops (DataFrame[WranglerStopsTable]): stops dataframe
        shapes (DataFrame[WranglerShapesTable]): shapes dataframe
        trips (DataFrame[WranglerTripsTable]): trips dataframe
        frequencies (DataFrame[WranglerFrequenciesTable]): frequencies dataframe
        routes (DataFrame[RoutesTable]): route dataframe
        agencies (Optional[DataFrame[AgenciesTable]]): agencies dataframe
        net (Optional[TransitNetwork]): TransitNetwork object
    """

    # the ordering here matters because the stops need to be added before stop_times if
    # stop times needs to be converted
    _table_models: ClassVar[dict] = {
        "agencies": AgenciesTable,
        "frequencies": WranglerFrequenciesTable,
        "routes": RoutesTable,
        "shapes": WranglerShapesTable,
        "stops": WranglerStopsTable,
        "trips": WranglerTripsTable,
        "stop_times": WranglerStopTimesTable,
    }

    # Define the converters if the table needs to be converted to a Wrangler table.
    # Format: "table_name": converter_function
    _converters: ClassVar[dict[str, Callable]] = {}

    table_names: ClassVar[list[str]] = [
        "frequencies",
        "routes",
        "shapes",
        "stops",
        "trips",
        "stop_times",
    ]

    optional_table_names: ClassVar[list[str]] = ["agencies"]

    def __init__(self, **kwargs):
        """Create a Feed object from a dictionary of DataFrames representing a GTFS feed.

        Args:
            kwargs: A dictionary containing DataFrames representing the tables of a GTFS feed.
        """
        self._net = None
        self.feed_path: Path = None
        self.initialize_tables(**kwargs)

        # Set extra provided attributes but just FYI in logger.
        extra_attr = {k: v for k, v in kwargs.items() if k not in self.table_names}
        if extra_attr:
            WranglerLogger.info(f"Adding additional attributes to Feed: {extra_attr.keys()}")
        for k, v in extra_attr:
            self.__setattr__(k, v)

    def set_by_id(
        self,
        table_name: str,
        set_df: pd.DataFrame,
        id_property: str = "index",
        properties: Optional[list[str]] = None,
    ):
        """Set one or more property values based on an ID property for a given table.

        Args:
            table_name (str): Name of the table to modify.
            set_df (pd.DataFrame): DataFrame with columns `<id_property>` and `value` containing
                values to set for the specified property where `<id_property>` is unique.
            id_property: Property to use as ID to set by. Defaults to "index".
            properties: List of properties to set which are in set_df. If not specified, will set
                all properties.
        """
        if not set_df[id_property].is_unique:
            msg = f"{id_property} must be unique in set_df."
            _dupes = set_df[id_property][set_df[id_property].duplicated()]
            WranglerLogger.error(msg + f"Found duplicates: {_dupes.sum()}")

            raise ValueError(msg)
        table_df = self.get_table(table_name)
        updated_df = update_df_by_col_value(table_df, set_df, id_property, properties=properties)
        self.__dict__[table_name] = updated_df

network_wrangler.transit.feed.feed.Feed.__init__

__init__(**kwargs)

Create a Feed object from a dictionary of DataFrames representing a GTFS feed.

Parameters:

  • kwargs

    A dictionary containing DataFrames representing the tables of a GTFS feed.

Source code in network_wrangler/transit/feed/feed.py
def __init__(self, **kwargs):
    """Create a Feed object from a dictionary of DataFrames representing a GTFS feed.

    Args:
        kwargs: A dictionary containing DataFrames representing the tables of a GTFS feed.
    """
    self._net = None
    self.feed_path: Path = None
    self.initialize_tables(**kwargs)

    # Set extra provided attributes but just FYI in logger.
    extra_attr = {k: v for k, v in kwargs.items() if k not in self.table_names}
    if extra_attr:
        WranglerLogger.info(f"Adding additional attributes to Feed: {extra_attr.keys()}")
    for k, v in extra_attr:
        self.__setattr__(k, v)

network_wrangler.transit.feed.feed.Feed.set_by_id

set_by_id(table_name, set_df, id_property='index', properties=None)

Set one or more property values based on an ID property for a given table.

Parameters:

  • table_name (str) –

    Name of the table to modify.

  • set_df (DataFrame) –

    DataFrame with columns <id_property> and value containing values to set for the specified property where <id_property> is unique.

  • id_property (str, default: 'index' ) –

    Property to use as ID to set by. Defaults to “index”.

  • properties (Optional[list[str]], default: None ) –

    List of properties to set which are in set_df. If not specified, will set all properties.

Source code in network_wrangler/transit/feed/feed.py
def set_by_id(
    self,
    table_name: str,
    set_df: pd.DataFrame,
    id_property: str = "index",
    properties: Optional[list[str]] = None,
):
    """Set one or more property values based on an ID property for a given table.

    Args:
        table_name (str): Name of the table to modify.
        set_df (pd.DataFrame): DataFrame with columns `<id_property>` and `value` containing
            values to set for the specified property where `<id_property>` is unique.
        id_property: Property to use as ID to set by. Defaults to "index".
        properties: List of properties to set which are in set_df. If not specified, will set
            all properties.
    """
    if not set_df[id_property].is_unique:
        msg = f"{id_property} must be unique in set_df."
        _dupes = set_df[id_property][set_df[id_property].duplicated()]
        WranglerLogger.error(msg + f"Found duplicates: {_dupes.sum()}")

        raise ValueError(msg)
    table_df = self.get_table(table_name)
    updated_df = update_df_by_col_value(table_df, set_df, id_property, properties=properties)
    self.__dict__[table_name] = updated_df

network_wrangler.transit.feed.feed.merge_shapes_to_stop_times

merge_shapes_to_stop_times(stop_times, shapes, trips)

Add shape_id and shape_pt_sequence to stop_times dataframe.

Parameters:

Returns:

Source code in network_wrangler/transit/feed/feed.py
def merge_shapes_to_stop_times(
    stop_times: DataFrame[WranglerStopTimesTable],
    shapes: DataFrame[WranglerShapesTable],
    trips: DataFrame[WranglerTripsTable],
) -> DataFrame[WranglerStopTimesTable]:
    """Add shape_id and shape_pt_sequence to stop_times dataframe.

    Args:
        stop_times: stop_times dataframe to add shape_id and shape_pt_sequence to.
        shapes: shapes dataframe to add to stop_times.
        trips: trips dataframe to link stop_times to shapes

    Returns:
        stop_times dataframe with shape_id and shape_pt_sequence added.
    """
    stop_times_w_shape_id = stop_times.merge(
        trips[["trip_id", "shape_id"]], on="trip_id", how="left"
    )

    stop_times_w_shapes = stop_times_w_shape_id.merge(
        shapes,
        how="left",
        left_on=["shape_id", "stop_id"],
        right_on=["shape_id", "shape_model_node_id"],
    )
    stop_times_w_shapes = stop_times_w_shapes.drop(columns=["shape_model_node_id"])
    return stop_times_w_shapes

network_wrangler.transit.feed.feed.stop_count_by_trip

stop_count_by_trip(stop_times)

Returns dataframe with trip_id and stop_count from stop_times.

Source code in network_wrangler/transit/feed/feed.py
def stop_count_by_trip(
    stop_times: DataFrame[WranglerStopTimesTable],
) -> pd.DataFrame:
    """Returns dataframe with trip_id and stop_count from stop_times."""
    stops_count = stop_times.groupby("trip_id").size()
    return stops_count.reset_index(name="stop_count")

Filters and queries of a gtfs frequencies table.

network_wrangler.transit.feed.frequencies.frequencies_for_trips

frequencies_for_trips(frequencies, trips)

Filter frequenceis dataframe to records associated with trips table.

Source code in network_wrangler/transit/feed/frequencies.py
def frequencies_for_trips(
    frequencies: DataFrame[WranglerFrequenciesTable], trips: DataFrame[WranglerTripsTable]
) -> DataFrame[WranglerFrequenciesTable]:
    """Filter frequenceis dataframe to records associated with trips table."""
    _sel_trips = trips.trip_id.unique().tolist()
    filtered_frequencies = frequencies[frequencies.trip_id.isin(_sel_trips)]
    WranglerLogger.debug(
        f"Filtered frequencies to {len(filtered_frequencies)}/{len(frequencies)} \
                         records that referenced one of {len(trips)} trips."
    )
    return filtered_frequencies

Filters and queries of a gtfs routes table and route_ids.

network_wrangler.transit.feed.routes.route_ids_for_trip_ids

route_ids_for_trip_ids(trips, trip_ids)

Returns route ids for given list of trip_ids.

Source code in network_wrangler/transit/feed/routes.py
def route_ids_for_trip_ids(trips: DataFrame[WranglerTripsTable], trip_ids: list[str]) -> list[str]:
    """Returns route ids for given list of trip_ids."""
    return trips[trips["trip_id"].isin(trip_ids)].route_id.unique().tolist()

network_wrangler.transit.feed.routes.routes_for_trip_ids

routes_for_trip_ids(routes, trips, trip_ids)

Returns route records for given list of trip_ids.

Source code in network_wrangler/transit/feed/routes.py
def routes_for_trip_ids(
    routes: DataFrame[RoutesTable], trips: DataFrame[WranglerTripsTable], trip_ids: list[str]
) -> DataFrame[RoutesTable]:
    """Returns route records for given list of trip_ids."""
    route_ids = route_ids_for_trip_ids(trips, trip_ids)
    return routes.loc[routes.route_id.isin(route_ids)]

network_wrangler.transit.feed.routes.routes_for_trips

routes_for_trips(routes, trips)

Filter routes dataframe to records associated with trip records.

Source code in network_wrangler/transit/feed/routes.py
def routes_for_trips(
    routes: DataFrame[RoutesTable], trips: DataFrame[WranglerTripsTable]
) -> DataFrame[RoutesTable]:
    """Filter routes dataframe to records associated with trip records."""
    _sel_routes = trips.route_id.unique().tolist()
    filtered_routes = routes[routes.route_id.isin(_sel_routes)]
    WranglerLogger.debug(
        f"Filtered routes to {len(filtered_routes)}/{len(routes)} \
                         records that referenced one of {len(trips)} trips."
    )
    return filtered_routes

Filters, queries of a gtfs shapes table and node patterns.

network_wrangler.transit.feed.shapes.find_nearest_stops

find_nearest_stops(shapes, trips, stop_times, trip_id, node_id, pickup_dropoff='either')

Returns node_ids (before and after) of nearest node_ids that are stops for a given trip_id.

Parameters:

  • shapes (WranglerShapesTable) –

    WranglerShapesTable

  • trips (WranglerTripsTable) –

    WranglerTripsTable

  • stop_times (WranglerStopTimesTable) –

    WranglerStopTimesTable

  • trip_id (str) –

    trip id to find nearest stops for

  • node_id (int) –

    node_id to find nearest stops for

  • pickup_dropoff (PickupDropoffAvailability, default: 'either' ) –

    str indicating logic for selecting stops based on piackup and dropoff availability at stop. Defaults to “either”. “either”: either pickup_type or dropoff_type > 0 “both”: both pickup_type or dropoff_type > 0 “pickup_only”: only pickup > 0 “dropoff_only”: only dropoff > 0

Returns:

  • tuple ( tuple[int, int] ) –

    node_ids for stop before and stop after

Source code in network_wrangler/transit/feed/shapes.py
def find_nearest_stops(
    shapes: WranglerShapesTable,
    trips: WranglerTripsTable,
    stop_times: WranglerStopTimesTable,
    trip_id: str,
    node_id: int,
    pickup_dropoff: PickupDropoffAvailability = "either",
) -> tuple[int, int]:
    """Returns node_ids (before and after) of nearest node_ids that are stops for a given trip_id.

    Args:
        shapes: WranglerShapesTable
        trips: WranglerTripsTable
        stop_times: WranglerStopTimesTable
        trip_id: trip id to find nearest stops for
        node_id: node_id to find nearest stops for
        pickup_dropoff: str indicating logic for selecting stops based on piackup and dropoff
            availability at stop. Defaults to "either".
            "either": either pickup_type or dropoff_type > 0
            "both": both pickup_type or dropoff_type > 0
            "pickup_only": only pickup > 0
            "dropoff_only": only dropoff > 0

    Returns:
        tuple: node_ids for stop before and stop after
    """
    shapes = shapes_with_stop_id_for_trip_id(
        shapes, trips, stop_times, trip_id, pickup_dropoff=pickup_dropoff
    )
    WranglerLogger.debug(f"Looking for stops near node_id: {node_id}")
    if node_id not in shapes["shape_model_node_id"].values:
        msg = f"Node ID {node_id} not in shapes for trip {trip_id}"
        raise ValueError(msg)
    # Find index of node_id in shapes
    node_idx = shapes[shapes["shape_model_node_id"] == node_id].index[0]

    # Find stops before and after new stop in shapes sequence
    nodes_before = shapes.loc[: node_idx - 1]
    stops_before = nodes_before.loc[nodes_before["stop_id"].notna()]
    stop_node_before = 0 if stops_before.empty else stops_before.iloc[-1]["shape_model_node_id"]

    nodes_after = shapes.loc[node_idx + 1 :]
    stops_after = nodes_after.loc[nodes_after["stop_id"].notna()]
    stop_node_after = 0 if stops_after.empty else stops_after.iloc[0]["shape_model_node_id"]

    return stop_node_before, stop_node_after

network_wrangler.transit.feed.shapes.node_pattern_for_shape_id

node_pattern_for_shape_id(shapes, shape_id)

Returns node pattern of a shape.

Source code in network_wrangler/transit/feed/shapes.py
def node_pattern_for_shape_id(shapes: DataFrame[WranglerShapesTable], shape_id: str) -> list[int]:
    """Returns node pattern of a shape."""
    shape_df = shapes.loc[shapes["shape_id"] == shape_id]
    shape_df = shape_df.sort_values(by=["shape_pt_sequence"])
    return shape_df["shape_model_node_id"].to_list()

network_wrangler.transit.feed.shapes.shape_id_for_trip_id

shape_id_for_trip_id(trips, trip_id)

Returns a shape_id for a given trip_id.

Source code in network_wrangler/transit/feed/shapes.py
def shape_id_for_trip_id(trips: WranglerTripsTable, trip_id: str) -> str:
    """Returns a shape_id for a given trip_id."""
    return trips.loc[trips.trip_id == trip_id, "shape_id"].values[0]

network_wrangler.transit.feed.shapes.shape_ids_for_trip_ids

shape_ids_for_trip_ids(trips, trip_ids)

Returns a list of shape_ids for a given list of trip_ids.

Source code in network_wrangler/transit/feed/shapes.py
def shape_ids_for_trip_ids(trips: DataFrame[WranglerTripsTable], trip_ids: list[str]) -> list[str]:
    """Returns a list of shape_ids for a given list of trip_ids."""
    return trips[trips["trip_id"].isin(trip_ids)].shape_id.unique().tolist()
shapes_for_road_links(shapes, links_df)

Filter shapes dataframe to records associated with links dataframe.

EX:

shapes = pd.DataFrame({ “shape_id”: [“1”, “1”, “1”, “1”, “2”, “2”, “2”, “2”, “2”], “shape_pt_sequence”: [1, 2, 3, 4, 1, 2, 3, 4, 5], “shape_model_node_id”: [1, 2, 3, 4, 2, 3, 1, 5, 4] })

links_df = pd.DataFrame({ “A”: [1, 2, 3], “B”: [2, 3, 4] })

shapes

shape_id shape_pt_sequence shape_model_node_id should retain 1 1 1 TRUE 1 2 2 TRUE 1 3 3 TRUE 1 4 4 TRUE 1 5 5 FALSE 2 1 1 TRUE 2 2 2 TRUE 2 3 3 TRUE 2 4 1 FALSE 2 5 5 FALSE 2 6 4 FALSE 2 7 1 FALSE - not largest segment 2 8 2 FALSE - not largest segment

links_df

A B 1 2 2 3 3 4

Source code in network_wrangler/transit/feed/shapes.py
def shapes_for_road_links(
    shapes: DataFrame[WranglerShapesTable], links_df: pd.DataFrame
) -> DataFrame[WranglerShapesTable]:
    """Filter shapes dataframe to records associated with links dataframe.

    EX:

    > shapes = pd.DataFrame({
        "shape_id": ["1", "1", "1", "1", "2", "2", "2", "2", "2"],
        "shape_pt_sequence": [1, 2, 3, 4, 1, 2, 3, 4, 5],
        "shape_model_node_id": [1, 2, 3, 4, 2, 3, 1, 5, 4]
    })

    > links_df = pd.DataFrame({
        "A": [1, 2, 3],
        "B": [2, 3, 4]
    })

    > shapes

    shape_id   shape_pt_sequence   shape_model_node_id *should retain*
    1          1                  1                        TRUE
    1          2                  2                        TRUE
    1          3                  3                        TRUE
    1          4                  4                        TRUE
    1          5                  5                       FALSE
    2          1                  1                        TRUE
    2          2                  2                        TRUE
    2          3                  3                        TRUE
    2          4                  1                       FALSE
    2          5                  5                       FALSE
    2          6                  4                       FALSE
    2          7                  1                       FALSE - not largest segment
    2          8                  2                       FALSE - not largest segment

    > links_df

    A   B
    1   2
    2   3
    3   4
    """
    """
    > shape_links

    shape_id  shape_pt_sequence_A  shape_model_node_id_A shape_pt_sequence_B shape_model_node_id_B
    1          1                        1                       2                        2
    1          2                        2                       3                        3
    1          3                        3                       4                        4
    1          4                        4                       5                        5
    2          1                        1                       2                        2
    2          2                        2                       3                        3
    2          3                        3                       4                        1
    2          4                        1                       5                        5
    2          5                        5                       6                        4
    2          6                        4                       7                        1
    2          7                        1                       8                        2
    """
    shape_links = shapes_to_shape_links(shapes)

    """
    > shape_links_w_links

    shape_id  shape_pt_sequence_A shape_pt_sequence_B  A  B
    1          1                         2             1  2
    1          2                         3             2  3
    1          3                         4             3  4
    2          1                         2             1  2
    2          2                         3             2  3
    2          7                         8             1  2
    """

    shape_links_w_links = shape_links.merge(
        links_df[["A", "B"]],
        how="inner",
        on=["A", "B"],
    )

    """
    Find largest segment of each shape_id that is in the links

    > longest_shape_segments
    shape_id, segment_id, segment_start_shape_pt_seq, segment_end_shape_pt_seq
    1          1                        1                       4
    2          1                        1                       3
    """
    longest_shape_segments = shape_links_to_longest_shape_segments(shape_links_w_links)

    """
    > shapes

    shape_id   shape_pt_sequence   shape_model_node_id
    1          1                  1
    1          2                  2
    1          3                  3
    1          4                  4
    2          1                  1
    2          2                  2
    2          3                  3
    """
    filtered_shapes = filter_shapes_to_segments(shapes, longest_shape_segments)
    filtered_shapes = filtered_shapes.reset_index(drop=True)
    return filtered_shapes

network_wrangler.transit.feed.shapes.shapes_for_shape_id

shapes_for_shape_id(shapes, shape_id)

Returns shape records for a given shape_id.

Source code in network_wrangler/transit/feed/shapes.py
def shapes_for_shape_id(
    shapes: DataFrame[WranglerShapesTable], shape_id: str
) -> DataFrame[WranglerShapesTable]:
    """Returns shape records for a given shape_id."""
    shapes = shapes.loc[shapes.shape_id == shape_id]
    return shapes.sort_values(by=["shape_pt_sequence"])

network_wrangler.transit.feed.shapes.shapes_for_trip_id

shapes_for_trip_id(shapes, trips, trip_id)

Returns shape records for a single given trip_id.

Source code in network_wrangler/transit/feed/shapes.py
def shapes_for_trip_id(
    shapes: DataFrame[WranglerShapesTable], trips: DataFrame[WranglerTripsTable], trip_id: str
) -> DataFrame[WranglerShapesTable]:
    """Returns shape records for a single given trip_id."""
    shape_id = shape_id_for_trip_id(trips, trip_id)
    return shapes.loc[shapes.shape_id == shape_id]

network_wrangler.transit.feed.shapes.shapes_for_trip_ids

shapes_for_trip_ids(shapes, trips, trip_ids)

Returns shape records for list of trip_ids.

Source code in network_wrangler/transit/feed/shapes.py
def shapes_for_trip_ids(
    shapes: DataFrame[WranglerShapesTable],
    trips: DataFrame[WranglerTripsTable],
    trip_ids: list[str],
) -> DataFrame[WranglerShapesTable]:
    """Returns shape records for list of trip_ids."""
    shape_ids = shape_ids_for_trip_ids(trips, trip_ids)
    return shapes.loc[shapes.shape_id.isin(shape_ids)]

network_wrangler.transit.feed.shapes.shapes_for_trips

shapes_for_trips(shapes, trips)

Filter shapes dataframe to records associated with trips table.

Source code in network_wrangler/transit/feed/shapes.py
def shapes_for_trips(
    shapes: DataFrame[WranglerShapesTable], trips: DataFrame[WranglerTripsTable]
) -> DataFrame[WranglerShapesTable]:
    """Filter shapes dataframe to records associated with trips table."""
    _sel_shapes = trips.shape_id.unique().tolist()
    filtered_shapes = shapes[shapes.shape_id.isin(_sel_shapes)]
    WranglerLogger.debug(
        f"Filtered shapes to {len(filtered_shapes)}/{len(shapes)} \
                         records that referenced one of {len(trips)} trips."
    )
    return filtered_shapes

network_wrangler.transit.feed.shapes.shapes_with_stop_id_for_trip_id

shapes_with_stop_id_for_trip_id(shapes, trips, stop_times, trip_id, pickup_dropoff='either')

Returns shapes.txt for a given trip_id with the stop_id added based on pickup_type.

Parameters:

  • shapes (DataFrame[WranglerShapesTable]) –

    WranglerShapesTable

  • trips (DataFrame[WranglerTripsTable]) –

    WranglerTripsTable

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    WranglerStopTimesTable

  • trip_id (str) –

    trip id to select

  • pickup_dropoff (PickupDropoffAvailability, default: 'either' ) –

    str indicating logic for selecting stops based on piackup and dropoff availability at stop. Defaults to “either”. “either”: either pickup_type or dropoff_type > 0 “both”: both pickup_type or dropoff_type > 0 “pickup_only”: only pickup > 0 “dropoff_only”: only dropoff > 0

Source code in network_wrangler/transit/feed/shapes.py
def shapes_with_stop_id_for_trip_id(
    shapes: DataFrame[WranglerShapesTable],
    trips: DataFrame[WranglerTripsTable],
    stop_times: DataFrame[WranglerStopTimesTable],
    trip_id: str,
    pickup_dropoff: PickupDropoffAvailability = "either",
) -> DataFrame[WranglerShapesTable]:
    """Returns shapes.txt for a given trip_id with the stop_id added based on pickup_type.

    Args:
        shapes: WranglerShapesTable
        trips: WranglerTripsTable
        stop_times: WranglerStopTimesTable
        trip_id: trip id to select
        pickup_dropoff: str indicating logic for selecting stops based on piackup and dropoff
            availability at stop. Defaults to "either".
            "either": either pickup_type or dropoff_type > 0
            "both": both pickup_type or dropoff_type > 0
            "pickup_only": only pickup > 0
            "dropoff_only": only dropoff > 0
    """
    from .stop_times import stop_times_for_pickup_dropoff_trip_id  # noqa: PLC0415

    shapes = shapes_for_trip_id(shapes, trips, trip_id)
    trip_stop_times = stop_times_for_pickup_dropoff_trip_id(
        stop_times, trip_id, pickup_dropoff=pickup_dropoff
    )

    stop_times_cols = [
        "stop_id",
        "trip_id",
        "pickup_type",
        "drop_off_type",
    ]

    shape_with_trip_stops = shapes.merge(
        trip_stop_times[stop_times_cols],
        how="left",
        right_on="stop_id",
        left_on="shape_model_node_id",
    )
    shape_with_trip_stops = shape_with_trip_stops.sort_values(by=["shape_pt_sequence"])
    return shape_with_trip_stops

network_wrangler.transit.feed.shapes.shapes_with_stops_for_shape_id

shapes_with_stops_for_shape_id(shapes, trips, stop_times, shape_id)

Returns a DataFrame containing shapes with associated stops for a given shape_id.

Parameters:

  • shapes (DataFrame[WranglerShapesTable]) –

    DataFrame containing shape data.

  • trips (DataFrame[WranglerTripsTable]) –

    DataFrame containing trip data.

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    DataFrame containing stop times data.

  • shape_id (str) –

    The shape_id for which to retrieve shapes with stops.

Returns:

  • DataFrame[WranglerShapesTable]

    DataFrame[WranglerShapesTable]: DataFrame containing shapes with associated stops.

Source code in network_wrangler/transit/feed/shapes.py
def shapes_with_stops_for_shape_id(
    shapes: DataFrame[WranglerShapesTable],
    trips: DataFrame[WranglerTripsTable],
    stop_times: DataFrame[WranglerStopTimesTable],
    shape_id: str,
) -> DataFrame[WranglerShapesTable]:
    """Returns a DataFrame containing shapes with associated stops for a given shape_id.

    Parameters:
        shapes (DataFrame[WranglerShapesTable]): DataFrame containing shape data.
        trips (DataFrame[WranglerTripsTable]): DataFrame containing trip data.
        stop_times (DataFrame[WranglerStopTimesTable]): DataFrame containing stop times data.
        shape_id (str): The shape_id for which to retrieve shapes with stops.

    Returns:
        DataFrame[WranglerShapesTable]: DataFrame containing shapes with associated stops.
    """
    from .trips import trip_ids_for_shape_id  # noqa: PLC0415

    trip_ids = trip_ids_for_shape_id(trips, shape_id)
    all_shape_stop_times = concat_with_attr(
        [shapes_with_stop_id_for_trip_id(shapes, trips, stop_times, t) for t in trip_ids]
    )
    shapes_with_stops = all_shape_stop_times[all_shape_stop_times["stop_id"].notna()]
    shapes_with_stops = shapes_with_stops.sort_values(by=["shape_pt_sequence"])
    return shapes_with_stops

Filters and queries of a gtfs stop_times table.

network_wrangler.transit.feed.stop_times.stop_times_for_longest_segments

stop_times_for_longest_segments(stop_times)

Find the longest segment of each trip_id that is in the stop_times.

Segment ends defined based on interruptions in stop_sequence.

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_longest_segments(
    stop_times: DataFrame[WranglerStopTimesTable],
) -> pd.DataFrame:
    """Find the longest segment of each trip_id that is in the stop_times.

    Segment ends defined based on interruptions in `stop_sequence`.
    """
    stop_times = stop_times.sort_values(by=["trip_id", "stop_sequence"])

    stop_times["prev_stop_sequence"] = stop_times.groupby("trip_id")["stop_sequence"].shift(1)
    stop_times["gap"] = (stop_times["stop_sequence"] - stop_times["prev_stop_sequence"]).ne(
        1
    ) | stop_times["prev_stop_sequence"].isna()

    stop_times["segment_id"] = stop_times["gap"].cumsum()
    # WranglerLogger.debug(f"stop_times with segment_id:\n{stop_times}")

    # Calculate the length of each segment
    segment_lengths = (
        stop_times.groupby(["trip_id", "segment_id"]).size().reset_index(name="segment_length")
    )

    # Identify the longest segment for each trip
    idx = segment_lengths.groupby("trip_id")["segment_length"].idxmax()
    longest_segments = segment_lengths.loc[idx]

    # Merge longest segment info back to stop_times
    stop_times = stop_times.merge(
        longest_segments[["trip_id", "segment_id"]],
        on=["trip_id", "segment_id"],
        how="inner",
    )

    # Drop temporary columns used for calculations
    stop_times.drop(columns=["prev_stop_sequence", "gap", "segment_id"], inplace=True)
    # WranglerLogger.debug(f"stop_timesw/longest segments:\n{stop_times}")
    return stop_times

network_wrangler.transit.feed.stop_times.stop_times_for_min_stops

stop_times_for_min_stops(stop_times, min_stops)

Filter stop_times dataframe to only the records which have >= min_stops for the trip.

Parameters:

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    stoptimestable to filter

  • min_stops (int) –

    minimum stops to require to keep trip in stoptimes

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_min_stops(
    stop_times: DataFrame[WranglerStopTimesTable], min_stops: int
) -> DataFrame[WranglerStopTimesTable]:
    """Filter stop_times dataframe to only the records which have >= min_stops for the trip.

    Args:
        stop_times: stoptimestable to filter
        min_stops: minimum stops to require to keep trip in stoptimes
    """
    stop_ct_by_trip_df = stop_count_by_trip(stop_times)

    # Filter to obtain DataFrame of trips with stop counts >= min_stops
    min_stop_ct_trip_df = stop_ct_by_trip_df[stop_ct_by_trip_df.stop_count >= min_stops]
    if len(min_stop_ct_trip_df) == 0:
        msg = f"No trips meet threshold of minimum stops: {min_stops}"
        raise ValueError(msg)
    WranglerLogger.debug(
        f"Found {len(min_stop_ct_trip_df)} trips with a minimum of {min_stops} stops."
    )

    # Filter the original stop_times DataFrame to only include trips with >= min_stops
    filtered_stop_times = stop_times.merge(
        min_stop_ct_trip_df["trip_id"], on="trip_id", how="inner"
    )
    WranglerLogger.debug(
        f"Filter stop times to {len(filtered_stop_times)}/{len(stop_times)}\
            w/a minimum of {min_stops} stops."
    )

    return filtered_stop_times

network_wrangler.transit.feed.stop_times.stop_times_for_pickup_dropoff_trip_id

stop_times_for_pickup_dropoff_trip_id(stop_times, trip_id, pickup_dropoff='either')

Filters stop_times for a given trip_id based on pickup type.

GTFS values for pickup_type and drop_off_type” 0 or empty - Regularly scheduled pickup/dropoff. 1 - No pickup/dropoff available. 2 - Must phone agency to arrange pickup/dropoff. 3 - Must coordinate with driver to arrange pickup/dropoff.

Parameters:

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    A WranglerStopTimesTable to query.

  • trip_id (str) –

    trip_id to get stop pattern for

  • pickup_dropoff (PickupDropoffAvailability, default: 'either' ) –

    str indicating logic for selecting stops based on pickup and dropoff availability at stop. Defaults to “either”. “any”: all stoptime records “either”: either pickup_type or dropoff_type != 1 “both”: both pickup_type and dropoff_type != 1 “pickup_only”: dropoff = 1; pickup != 1 “dropoff_only”: pickup = 1; dropoff != 1

Source code in network_wrangler/transit/feed/stop_times.py
@validate_call_pyd
def stop_times_for_pickup_dropoff_trip_id(
    stop_times: DataFrame[WranglerStopTimesTable],
    trip_id: str,
    pickup_dropoff: PickupDropoffAvailability = "either",
) -> DataFrame[WranglerStopTimesTable]:
    """Filters stop_times for a given trip_id based on pickup type.

    GTFS values for pickup_type and drop_off_type"
        0 or empty - Regularly scheduled pickup/dropoff.
        1 - No pickup/dropoff available.
        2 - Must phone agency to arrange pickup/dropoff.
        3 - Must coordinate with driver to arrange pickup/dropoff.

    Args:
        stop_times: A WranglerStopTimesTable to query.
        trip_id: trip_id to get stop pattern for
        pickup_dropoff: str indicating logic for selecting stops based on pickup and dropoff
            availability at stop. Defaults to "either".
            "any": all stoptime records
            "either": either pickup_type or dropoff_type != 1
            "both": both pickup_type and dropoff_type != 1
            "pickup_only": dropoff = 1; pickup != 1
            "dropoff_only":  pickup = 1; dropoff != 1
    """
    trip_stop_pattern = stop_times_for_trip_id(stop_times, trip_id)

    if pickup_dropoff == "any":
        return trip_stop_pattern

    pickup_type_selection = {
        "either": (trip_stop_pattern.pickup_type != 1) | (trip_stop_pattern.drop_off_type != 1),
        "both": (trip_stop_pattern.pickup_type != 1) & (trip_stop_pattern.drop_off_type != 1),
        "pickup_only": (trip_stop_pattern.pickup_type != 1)
        & (trip_stop_pattern.drop_off_type == 1),
        "dropoff_only": (trip_stop_pattern.drop_off_type != 1)
        & (trip_stop_pattern.pickup_type == 1),
    }

    selection = pickup_type_selection[pickup_dropoff]
    trip_stops = trip_stop_pattern[selection]

    return trip_stops

network_wrangler.transit.feed.stop_times.stop_times_for_route_ids

stop_times_for_route_ids(stop_times, trips, route_ids)

Returns a stop_time records for a list of route_ids.

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_route_ids(
    stop_times: DataFrame[WranglerStopTimesTable],
    trips: DataFrame[WranglerTripsTable],
    route_ids: list[str],
) -> DataFrame[WranglerStopTimesTable]:
    """Returns a stop_time records for a list of route_ids."""
    trip_ids = trips.loc[trips.route_id.isin(route_ids)].trip_id.unique()
    return stop_times_for_trip_ids(stop_times, trip_ids)

network_wrangler.transit.feed.stop_times.stop_times_for_shapes

stop_times_for_shapes(stop_times, shapes, trips)

Filter stop_times dataframe to records associated with shapes dataframe.

Where multiple segments of stop_times are found to match shapes, retain only the longest.

Parameters:

Returns:

  • should be retained

    stop_times

trip_id stop_sequence stop_id t1 1 1 t1 2 2 t1 3 3 t1 4 5 t2 1 1 *t2 2 3 t2 3 7

shapes

shape_id shape_pt_sequence shape_model_node_id s1 1 1 s1 2 2 s1 3 3 s1 4 4 s2 1 1 s2 2 2 s2 3 3

trips

trip_id shape_id t1 s1 t2 s2

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_shapes(
    stop_times: DataFrame[WranglerStopTimesTable],
    shapes: DataFrame[WranglerShapesTable],
    trips: DataFrame[WranglerTripsTable],
) -> DataFrame[WranglerStopTimesTable]:
    """Filter stop_times dataframe to records associated with shapes dataframe.

    Where multiple segments of stop_times are found to match shapes, retain only the longest.

    Args:
        stop_times: stop_times dataframe to filter
        shapes: shapes dataframe to stop_times to.
        trips: trips to link stop_times to shapess

    Returns:
        filtered stop_times dataframe

    EX:
    * should be retained
    > stop_times

    trip_id   stop_sequence   stop_id
    *t1          1                  1
    *t1          2                  2
    *t1          3                  3
    t1           4                  5
    *t2          1                  1
    *t2          2                  3
    t2           3                  7

    > shapes

    shape_id   shape_pt_sequence   shape_model_node_id
    s1          1                  1
    s1          2                  2
    s1          3                  3
    s1          4                  4
    s2          1                  1
    s2          2                  2
    s2          3                  3

    > trips

    trip_id   shape_id
    t1          s1
    t2          s2
    """
    """
    > stop_times_w_shapes

    trip_id   stop_sequence   stop_id    shape_id   shape_pt_sequence
    *t1          1                  1        s1          1
    *t1          2                  2        s1          2
    *t1          3                  3        s1          3
    t1           4                  5        NA          NA
    *t2          1                  1        s2          1
    *t2          2                  3        s2          2
    t2           3                  7        NA          NA

    """
    stop_times_w_shapes = merge_shapes_to_stop_times(stop_times, shapes, trips)
    # WranglerLogger.debug(f"stop_times_w_shapes :\n{stop_times_w_shapes}")
    """
    > stop_times_w_shapes

    trip_id   stop_sequence   stop_id   shape_id   shape_pt_sequence
    *t1          1               1        s1          1
    *t1          2               2        s1          2
    *t1          3               3        s1          3
    *t2          1               1        s2          1
    *t2          2               3        s2          2

    """
    filtered_stop_times = stop_times_w_shapes[stop_times_w_shapes["shape_pt_sequence"].notna()]
    # WranglerLogger.debug(f"filtered_stop_times:\n{filtered_stop_times}")

    # Filter out any stop_times the shape_pt_sequence is not ascending
    valid_stop_times = filtered_stop_times.groupby("trip_id").filter(
        lambda x: x["shape_pt_sequence"].is_monotonic_increasing
    )
    # WranglerLogger.debug(f"valid_stop_times:\n{valid_stop_times}")

    valid_stop_times = valid_stop_times.drop(columns=["shape_id", "shape_pt_sequence"])

    longest_valid_stop_times = stop_times_for_longest_segments(valid_stop_times)
    longest_valid_stop_times = longest_valid_stop_times.reset_index(drop=True)

    return longest_valid_stop_times

network_wrangler.transit.feed.stop_times.stop_times_for_stops

stop_times_for_stops(stop_times, stops)

Filter stop_times dataframe to only have stop_times associated with stops records.

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_stops(
    stop_times: DataFrame[WranglerStopTimesTable], stops: DataFrame[WranglerStopsTable]
) -> DataFrame[WranglerStopTimesTable]:
    """Filter stop_times dataframe to only have stop_times associated with stops records."""
    _sel_stops = stops.stop_id.unique().tolist()
    filtered_stop_times = stop_times[stop_times.stop_id.isin(_sel_stops)]
    WranglerLogger.debug(
        f"Filtered stop_times to {len(filtered_stop_times)}/{len(stop_times)} \
                         records that referenced one of {len(stops)} stops."
    )
    return filtered_stop_times

network_wrangler.transit.feed.stop_times.stop_times_for_trip_id

stop_times_for_trip_id(stop_times, trip_id)

Returns a stop_time records for a given trip_id.

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_trip_id(
    stop_times: DataFrame[WranglerStopTimesTable], trip_id: str
) -> DataFrame[WranglerStopTimesTable]:
    """Returns a stop_time records for a given trip_id."""
    stop_times = stop_times.loc[stop_times.trip_id == trip_id]
    return stop_times.sort_values(by=["stop_sequence"])

network_wrangler.transit.feed.stop_times.stop_times_for_trip_ids

stop_times_for_trip_ids(stop_times, trip_ids)

Returns a stop_time records for a given list of trip_ids.

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_trip_ids(
    stop_times: DataFrame[WranglerStopTimesTable], trip_ids: list[str]
) -> DataFrame[WranglerStopTimesTable]:
    """Returns a stop_time records for a given list of trip_ids."""
    stop_times = stop_times.loc[stop_times.trip_id.isin(trip_ids)]
    return stop_times.sort_values(by=["stop_sequence"])

network_wrangler.transit.feed.stop_times.stop_times_for_trip_node_segment

stop_times_for_trip_node_segment(stop_times, trip_id, node_id_start, node_id_end, include_start=True, include_end=True)

Returns stop_times for a given trip_id between two nodes or with those nodes included.

Parameters:

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    WranglerStopTimesTable

  • trip_id (str) –

    trip id to select

  • node_id_start (int) –

    int of the starting node

  • node_id_end (int) –

    int of the ending node

  • include_start (bool, default: True ) –

    bool indicating if the start node should be included in the segment. Defaults to True.

  • include_end (bool, default: True ) –

    bool indicating if the end node should be included in the segment. Defaults to True.

Source code in network_wrangler/transit/feed/stop_times.py
def stop_times_for_trip_node_segment(
    stop_times: DataFrame[WranglerStopTimesTable],
    trip_id: str,
    node_id_start: int,
    node_id_end: int,
    include_start: bool = True,
    include_end: bool = True,
) -> DataFrame[WranglerStopTimesTable]:
    """Returns stop_times for a given trip_id between two nodes or with those nodes included.

    Args:
        stop_times: WranglerStopTimesTable
        trip_id: trip id to select
        node_id_start: int of the starting node
        node_id_end: int of the ending node
        include_start: bool indicating if the start node should be included in the segment.
            Defaults to True.
        include_end: bool indicating if the end node should be included in the segment.
            Defaults to True.
    """
    stop_times = stop_times_for_trip_id(stop_times, trip_id)
    start_idx = stop_times[stop_times["stop_id"] == node_id_start].index[0]
    end_idx = stop_times[stop_times["stop_id"] == node_id_end].index[0]
    if not include_start:
        start_idx += 1
    if include_end:
        end_idx += 1
    return stop_times.loc[start_idx:end_idx]

Filters and queries of a gtfs stops table and stop_ids.

network_wrangler.transit.feed.stops.node_is_stop

node_is_stop(stops, stop_times, node_id, trip_id, pickup_dropoff='either')

Returns boolean indicating if a (or list of) node(s)) is (are) stops for a given trip_id.

Parameters:

  • stops (DataFrame[WranglerStopsTable]) –

    WranglerStopsTable

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    WranglerStopTimesTable

  • node_id (Union[int, list[int]]) –

    node ID for roadway

  • trip_id (str) –

    trip_id to get stop pattern for

  • pickup_dropoff (PickupDropoffAvailability, default: 'either' ) –

    str indicating logic for selecting stops based on piackup and dropoff availability at stop. Defaults to “either”. “either”: either pickup_type or dropoff_type > 0 “both”: both pickup_type or dropoff_type > 0 “pickup_only”: only pickup > 0 “dropoff_only”: only dropoff > 0

Source code in network_wrangler/transit/feed/stops.py
def node_is_stop(
    stops: DataFrame[WranglerStopsTable],
    stop_times: DataFrame[WranglerStopTimesTable],
    node_id: Union[int, list[int]],
    trip_id: str,
    pickup_dropoff: PickupDropoffAvailability = "either",
) -> Union[bool, list[bool]]:
    """Returns boolean indicating if a (or list of) node(s)) is (are) stops for a given trip_id.

    Args:
        stops: WranglerStopsTable
        stop_times: WranglerStopTimesTable
        node_id: node ID for roadway
        trip_id: trip_id to get stop pattern for
        pickup_dropoff: str indicating logic for selecting stops based on piackup and dropoff
            availability at stop. Defaults to "either".
            "either": either pickup_type or dropoff_type > 0
            "both": both pickup_type or dropoff_type > 0
            "pickup_only": only pickup > 0
            "dropoff_only": only dropoff > 0
    """
    trip_stop_nodes = stops_for_trip_id(stops, stop_times, trip_id, pickup_dropoff=pickup_dropoff)[
        "stop_id"
    ]
    if isinstance(node_id, list):
        return [n in trip_stop_nodes.values for n in node_id]
    return node_id in trip_stop_nodes.values

network_wrangler.transit.feed.stops.stop_id_pattern_for_trip

stop_id_pattern_for_trip(stop_times, trip_id, pickup_dropoff='either')

Returns a stop pattern for a given trip_id given by a list of stop_ids.

Parameters:

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    WranglerStopTimesTable

  • trip_id (str) –

    trip_id to get stop pattern for

  • pickup_dropoff (PickupDropoffAvailability, default: 'either' ) –

    str indicating logic for selecting stops based on piackup and dropoff availability at stop. Defaults to “either”. “either”: either pickup_type or dropoff_type > 0 “both”: both pickup_type or dropoff_type > 0 “pickup_only”: only pickup > 0 “dropoff_only”: only dropoff > 0

Source code in network_wrangler/transit/feed/stops.py
@validate_call_pyd
def stop_id_pattern_for_trip(
    stop_times: DataFrame[WranglerStopTimesTable],
    trip_id: str,
    pickup_dropoff: PickupDropoffAvailability = "either",
) -> list[str]:
    """Returns a stop pattern for a given trip_id given by a list of stop_ids.

    Args:
        stop_times: WranglerStopTimesTable
        trip_id: trip_id to get stop pattern for
        pickup_dropoff: str indicating logic for selecting stops based on piackup and dropoff
            availability at stop. Defaults to "either".
            "either": either pickup_type or dropoff_type > 0
            "both": both pickup_type or dropoff_type > 0
            "pickup_only": only pickup > 0
            "dropoff_only": only dropoff > 0
    """
    from .stop_times import stop_times_for_pickup_dropoff_trip_id  # noqa: PLC0415

    trip_stops = stop_times_for_pickup_dropoff_trip_id(
        stop_times, trip_id, pickup_dropoff=pickup_dropoff
    )
    return trip_stops.stop_id.to_list()

network_wrangler.transit.feed.stops.stops_for_stop_times

stops_for_stop_times(stops, stop_times)

Filter stops dataframe to only have stops associated with stop_times records.

Source code in network_wrangler/transit/feed/stops.py
def stops_for_stop_times(
    stops: DataFrame[WranglerStopsTable], stop_times: DataFrame[WranglerStopTimesTable]
) -> DataFrame[WranglerStopsTable]:
    """Filter stops dataframe to only have stops associated with stop_times records."""
    _sel_stops_ge_min = stop_times.stop_id.unique().tolist()
    filtered_stops = stops[stops.stop_id.isin(_sel_stops_ge_min)]
    WranglerLogger.debug(
        f"Filtered stops to {len(filtered_stops)}/{len(stops)} \
                         records that referenced one of {len(stop_times)} stop_times."
    )
    return filtered_stops

network_wrangler.transit.feed.stops.stops_for_trip_id

stops_for_trip_id(stops, stop_times, trip_id, pickup_dropoff='any')

Returns stops.txt which are used for a given trip_id.

Source code in network_wrangler/transit/feed/stops.py
def stops_for_trip_id(
    stops: DataFrame[WranglerStopsTable],
    stop_times: DataFrame[WranglerStopTimesTable],
    trip_id: str,
    pickup_dropoff: PickupDropoffAvailability = "any",
) -> DataFrame[WranglerStopsTable]:
    """Returns stops.txt which are used for a given trip_id."""
    stop_ids = stop_id_pattern_for_trip(stop_times, trip_id, pickup_dropoff=pickup_dropoff)
    return stops.loc[stops.stop_id.isin(stop_ids)]

Filters and queries of a gtfs trips table and trip_ids.

network_wrangler.transit.feed.trips.trip_ids_for_shape_id

trip_ids_for_shape_id(trips, shape_id)

Returns a list of trip_ids for a given shape_id.

Source code in network_wrangler/transit/feed/trips.py
def trip_ids_for_shape_id(trips: DataFrame[WranglerTripsTable], shape_id: str) -> list[str]:
    """Returns a list of trip_ids for a given shape_id."""
    return trips_for_shape_id(trips, shape_id)["trip_id"].unique().tolist()

network_wrangler.transit.feed.trips.trips_for_shape_id

trips_for_shape_id(trips, shape_id)

Returns a trips records for a given shape_id.

Source code in network_wrangler/transit/feed/trips.py
def trips_for_shape_id(
    trips: DataFrame[WranglerTripsTable], shape_id: str
) -> DataFrame[WranglerTripsTable]:
    """Returns a trips records for a given shape_id."""
    return trips.loc[trips.shape_id == shape_id]

network_wrangler.transit.feed.trips.trips_for_stop_times

trips_for_stop_times(trips, stop_times)

Filter trips dataframe to records associated with stop_time records.

Source code in network_wrangler/transit/feed/trips.py
def trips_for_stop_times(
    trips: DataFrame[WranglerTripsTable], stop_times: DataFrame[WranglerStopTimesTable]
) -> DataFrame[WranglerTripsTable]:
    """Filter trips dataframe to records associated with stop_time records."""
    _sel_trips = stop_times.trip_id.unique().tolist()
    filtered_trips = trips[trips.trip_id.isin(_sel_trips)]
    WranglerLogger.debug(
        f"Filtered trips to {len(filtered_trips)}/{len(trips)} \
                         records that referenced one of {len(stop_times)} stop_times."
    )
    return filtered_trips

Functions for translating transit tables into visualizable links relatable to roadway network.

shapes_to_shape_links(shapes)

Converts shapes DataFrame to shape links DataFrame.

Parameters:

Returns:

  • DataFrame

    pd.DataFrame: The resulting shape links DataFrame.

Source code in network_wrangler/transit/feed/transit_links.py
def shapes_to_shape_links(shapes: DataFrame[WranglerShapesTable]) -> pd.DataFrame:
    """Converts shapes DataFrame to shape links DataFrame.

    Args:
        shapes (DataFrame[WranglerShapesTable]): The input shapes DataFrame.

    Returns:
        pd.DataFrame: The resulting shape links DataFrame.
    """
    return point_seq_to_links(
        shapes,
        id_field="shape_id",
        seq_field="shape_pt_sequence",
        node_id_field="shape_model_node_id",
    )
stop_times_to_stop_times_links(stop_times, from_field='A', to_field='B')

Converts stop times to stop times links.

Parameters:

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    The stop times data.

  • from_field (str, default: 'A' ) –

    The name of the field representing the ‘from’ stop. Defaults to “A”.

  • to_field (str, default: 'B' ) –

    The name of the field representing the ‘to’ stop. Defaults to “B”.

Returns:

  • DataFrame

    pd.DataFrame: The resulting stop times links.

Source code in network_wrangler/transit/feed/transit_links.py
def stop_times_to_stop_times_links(
    stop_times: DataFrame[WranglerStopTimesTable],
    from_field: str = "A",
    to_field: str = "B",
) -> pd.DataFrame:
    """Converts stop times to stop times links.

    Args:
        stop_times (DataFrame[WranglerStopTimesTable]): The stop times data.
        from_field (str, optional): The name of the field representing the 'from' stop.
            Defaults to "A".
        to_field (str, optional): The name of the field representing the 'to' stop.
            Defaults to "B".

    Returns:
        pd.DataFrame: The resulting stop times links.
    """
    return point_seq_to_links(
        stop_times,
        id_field="trip_id",
        seq_field="stop_sequence",
        node_id_field="stop_id",
        from_field=from_field,
        to_field=to_field,
    )
unique_shape_links(shapes, from_field='A', to_field='B')

Returns a DataFrame containing unique shape links based on the provided shapes DataFrame.

Parameters:

  • shapes (DataFrame[WranglerShapesTable]) –

    The input DataFrame containing shape information.

  • from_field (str, default: 'A' ) –

    The name of the column representing the ‘from’ field. Defaults to “A”.

  • to_field (str, default: 'B' ) –

    The name of the column representing the ‘to’ field. Defaults to “B”.

Returns:

  • DataFrame

    pd.DataFrame: DataFrame containing unique shape links based on the provided shapes df.

Source code in network_wrangler/transit/feed/transit_links.py
def unique_shape_links(
    shapes: DataFrame[WranglerShapesTable], from_field: str = "A", to_field: str = "B"
) -> pd.DataFrame:
    """Returns a DataFrame containing unique shape links based on the provided shapes DataFrame.

    Parameters:
        shapes (DataFrame[WranglerShapesTable]): The input DataFrame containing shape information.
        from_field (str, optional): The name of the column representing the 'from' field.
            Defaults to "A".
        to_field (str, optional): The name of the column representing the 'to' field.
            Defaults to "B".

    Returns:
        pd.DataFrame: DataFrame containing unique shape links based on the provided shapes df.
    """
    shape_links = shapes_to_shape_links(shapes)
    # WranglerLogger.debug(f"Shape links: \n {shape_links[['shape_id', from_field, to_field]]}")

    _agg_dict: dict[str, Union[type, str]] = {"shape_id": list}
    _opt_fields = [f"shape_pt_{v}_{t}" for v in ["lat", "lon"] for t in [from_field, to_field]]
    for f in _opt_fields:
        if f in shape_links:
            _agg_dict[f] = "first"

    unique_shape_links = shape_links.groupby([from_field, to_field]).agg(_agg_dict).reset_index()
    return unique_shape_links
unique_stop_time_links(stop_times, from_field='A', to_field='B')

Returns a DataFrame containing unique stop time links based on the given stop times DataFrame.

Parameters:

  • stop_times (DataFrame[WranglerStopTimesTable]) –

    The DataFrame containing stop times data.

  • from_field (str, default: 'A' ) –

    The name of the column representing the ‘from’ field in the stop times DataFrame. Defaults to “A”.

  • to_field (str, default: 'B' ) –

    The name of the column representing the ‘to’ field in the stop times DataFrame. Defaults to “B”.

Returns:

  • DataFrame

    pd.DataFrame: A DataFrame containing unique stop time links with columns ‘from_field’, ‘to_field’, and ‘trip_id’.

Source code in network_wrangler/transit/feed/transit_links.py
def unique_stop_time_links(
    stop_times: DataFrame[WranglerStopTimesTable],
    from_field: str = "A",
    to_field: str = "B",
) -> pd.DataFrame:
    """Returns a DataFrame containing unique stop time links based on the given stop times DataFrame.

    Parameters:
        stop_times (DataFrame[WranglerStopTimesTable]): The DataFrame containing stop times data.
        from_field (str, optional): The name of the column representing the 'from' field in the
            stop times DataFrame. Defaults to "A".
        to_field (str, optional): The name of the column representing the 'to' field in the stop
            times DataFrame. Defaults to "B".

    Returns:
        pd.DataFrame: A DataFrame containing unique stop time links with columns 'from_field',
            'to_field', and 'trip_id'.
    """
    links = stop_times_to_stop_times_links(stop_times, from_field=from_field, to_field=to_field)
    unique_links = links.groupby([from_field, to_field])["trip_id"].apply(list).reset_index()
    return unique_links

Functions to create segments from shapes and shape_links.

network_wrangler.transit.feed.transit_segments.filter_shapes_to_segments

filter_shapes_to_segments(shapes, segments)

Filter shapes dataframe to records associated with segments dataframe.

Parameters:

  • shapes (DataFrame[WranglerShapesTable]) –

    shapes dataframe to filter

  • segments (DataFrame) –

    segments dataframe to filter by with shape_id, segment_start_shape_pt_seq, segment_end_shape_pt_seq . Should have one record per shape_id.

Returns:

Source code in network_wrangler/transit/feed/transit_segments.py
def filter_shapes_to_segments(
    shapes: DataFrame[WranglerShapesTable], segments: pd.DataFrame
) -> DataFrame[WranglerShapesTable]:
    """Filter shapes dataframe to records associated with segments dataframe.

    Args:
        shapes: shapes dataframe to filter
        segments: segments dataframe to filter by with shape_id, segment_start_shape_pt_seq,
            segment_end_shape_pt_seq . Should have one record per shape_id.

    Returns:
        filtered shapes dataframe
    """
    shapes_w_segs = shapes.merge(segments, on="shape_id", how="left")

    # Retain only those points within the segment sequences
    filtered_shapes = shapes_w_segs[
        (shapes_w_segs["shape_pt_sequence"] >= shapes_w_segs["segment_start_shape_pt_seq"])
        & (shapes_w_segs["shape_pt_sequence"] <= shapes_w_segs["segment_end_shape_pt_seq"])
    ]

    drop_cols = [
        "segment_id",
        "segment_start_shape_pt_seq",
        "segment_end_shape_pt_seq",
        "segment_length",
    ]
    filtered_shapes = filtered_shapes.drop(columns=drop_cols)

    return filtered_shapes
shape_links_to_longest_shape_segments(shape_links)

Find the longest segment of each shape_id that is in the links.

Parameters:

  • shape_links

    DataFrame with shape_id, shape_pt_sequence_A, shape_pt_sequence_B

Returns:

  • DataFrame

    DataFrame with shape_id, segment_id, segment_start_shape_pt_seq, segment_end_shape_pt_seq

Source code in network_wrangler/transit/feed/transit_segments.py
def shape_links_to_longest_shape_segments(shape_links) -> pd.DataFrame:
    """Find the longest segment of each shape_id that is in the links.

    Args:
        shape_links: DataFrame with shape_id, shape_pt_sequence_A, shape_pt_sequence_B

    Returns:
        DataFrame with shape_id, segment_id, segment_start_shape_pt_seq, segment_end_shape_pt_seq
    """
    segments = shape_links_to_segments(shape_links)
    idx = segments.groupby("shape_id")["segment_length"].idxmax()
    longest_segments = segments.loc[idx]
    return longest_segments
shape_links_to_segments(shape_links)

Convert shape_links to segments by shape_id with segments of continuous shape_pt_sequence.

DataFrame with shape_id, segment_id, segment_start_shape_pt_seq,

  • DataFrame

    segment_end_shape_pt_seq

Source code in network_wrangler/transit/feed/transit_segments.py
def shape_links_to_segments(shape_links) -> pd.DataFrame:
    """Convert shape_links to segments by shape_id with segments of continuous shape_pt_sequence.

    Returns: DataFrame with shape_id, segment_id, segment_start_shape_pt_seq,
        segment_end_shape_pt_seq
    """
    shape_links["gap"] = shape_links.groupby("shape_id")["shape_pt_sequence_A"].diff().gt(1)
    shape_links["segment_id"] = shape_links.groupby("shape_id")["gap"].cumsum()

    # Define segment starts and ends
    segment_definitions = (
        shape_links.groupby(["shape_id", "segment_id"])
        .agg(
            segment_start_shape_pt_seq=("shape_pt_sequence_A", "min"),
            segment_end_shape_pt_seq=("shape_pt_sequence_B", "max"),
        )
        .reset_index()
    )

    # Optionally calculate segment lengths for further uses
    segment_definitions["segment_length"] = (
        segment_definitions["segment_end_shape_pt_seq"]
        - segment_definitions["segment_start_shape_pt_seq"]
        + 1
    )

    return segment_definitions

Transit Projects

Functions for adding a transit route to a TransitNetwork.

network_wrangler.transit.projects.add_route.apply_transit_route_addition

apply_transit_route_addition(net, transit_route_addition, reference_road_net=None)

Add transit route to TransitNetwork.

Parameters:

  • net (TransitNetwork) –

    Network to modify.

  • transit_route_addition (dict) –

    route dictionary to add to the feed.

  • reference_road_net (Optional[RoadwayNetwork], default: None ) –

    (RoadwayNetwork, optional): Reference roadway network to use for adding shapes and stops. Defaults to None.

Returns:

Source code in network_wrangler/transit/projects/add_route.py
def apply_transit_route_addition(
    net: TransitNetwork,
    transit_route_addition: dict,
    reference_road_net: Optional[RoadwayNetwork] = None,
) -> TransitNetwork:
    """Add transit route to TransitNetwork.

    Args:
        net (TransitNetwork): Network to modify.
        transit_route_addition: route dictionary to add to the feed.
        reference_road_net: (RoadwayNetwork, optional): Reference roadway network to use for adding shapes and stops. Defaults to None.

    Returns:
        TransitNetwork: Modified network.
    """
    WranglerLogger.debug("Applying add transit route project.")

    add_routes = transit_route_addition["routes"]

    road_net = net.road_net if reference_road_net is None else reference_road_net
    if road_net is None:
        WranglerLogger.error(
            "! Must have a reference road network set in order to update transit \
                         routin.  Either provide as an input to this function or set it for the \
                         transit network: >> transit_net.road_net = ..."
        )
        msg = "Must have a reference road network set in order to update transit routing."
        raise TransitRouteAddError(msg)

    net.feed = _add_route_to_feed(net.feed, add_routes, road_net)

    return net

Module for applying calculated transit projects to a transit network object.

These projects are stored in project card pycode property as python code strings which are executed to change the transit network object.

network_wrangler.transit.projects.calculate.apply_calculated_transit

apply_calculated_transit(net, pycode)

Changes transit network object by executing pycode.

Parameters:

  • net (TransitNetwork) –

    transit network to manipulate

  • pycode (str) –

    python code which changes values in the transit network object

Source code in network_wrangler/transit/projects/calculate.py
def apply_calculated_transit(
    net: TransitNetwork,
    pycode: str,
) -> TransitNetwork:
    """Changes transit network object by executing pycode.

    Args:
        net: transit network to manipulate
        pycode: python code which changes values in the transit network object
    """
    WranglerLogger.debug("Applying calculated transit project.")
    exec(pycode)

    return net

Functions for adding a transit route to a TransitNetwork.

network_wrangler.transit.projects.delete_service.apply_transit_service_deletion

apply_transit_service_deletion(net, selection, clean_shapes=False, clean_routes=False)

Delete transit service to TransitNetwork.

Parameters:

  • net (TransitNetwork) –

    Network to modify.

  • selection (TransitSelection) –

    TransitSelection object, created from a selection dictionary.

  • clean_shapes (bool, default: False ) –

    If True, remove shapes not used by any trips. Defaults to False.

  • clean_routes (bool, default: False ) –

    If True, remove routes not used by any trips. Defaults to False.

Returns:

Source code in network_wrangler/transit/projects/delete_service.py
def apply_transit_service_deletion(
    net: TransitNetwork,
    selection: TransitSelection,
    clean_shapes: Optional[bool] = False,
    clean_routes: Optional[bool] = False,
) -> TransitNetwork:
    """Delete transit service to TransitNetwork.

    Args:
        net (TransitNetwork): Network to modify.
        selection: TransitSelection object, created from a selection dictionary.
        clean_shapes (bool, optional): If True, remove shapes not used by any trips.
            Defaults to False.
        clean_routes (bool, optional): If True, remove routes not used by any trips.
            Defaults to False.

    Returns:
        TransitNetwork: Modified network.
    """
    WranglerLogger.debug("Applying delete transit service project.")

    trip_ids = selection.selected_trips
    net.feed = _delete_trips_from_feed(
        net.feed, trip_ids, clean_shapes=clean_shapes, clean_routes=clean_routes
    )

    return net

Functions for editing transit properties in a TransitNetwork.

network_wrangler.transit.projects.edit_property.apply_transit_property_change

apply_transit_property_change(net, selection, property_changes, project_name=None)

Apply changes to transit properties.

Parameters:

  • net (TransitNetwork) –

    Network to modify.

  • selection (TransitSelection) –

    Selection of trips to modify.

  • property_changes (dict) –

    Dictionary of properties to change.

  • project_name (str, default: None ) –

    Name of the project. Defaults to None.

Returns:

Source code in network_wrangler/transit/projects/edit_property.py
def apply_transit_property_change(
    net: TransitNetwork,
    selection: TransitSelection,
    property_changes: dict,
    project_name: Optional[str] = None,
) -> TransitNetwork:
    """Apply changes to transit properties.

    Args:
        net (TransitNetwork): Network to modify.
        selection (TransitSelection): Selection of trips to modify.
        property_changes (dict): Dictionary of properties to change.
        project_name (str, optional): Name of the project. Defaults to None.

    Returns:
        TransitNetwork: Modified network.
    """
    WranglerLogger.debug("Applying transit property change project.")
    for property, property_change in property_changes.items():
        net = _apply_transit_property_change_to_table(
            net,
            selection,
            property,
            property_change,
            project_name=project_name,
        )
    return net

Functions for editing the transit route shapes and stop patterns.

network_wrangler.transit.projects.edit_routing.apply_transit_routing_change

apply_transit_routing_change(net, selection, routing_change, reference_road_net=None, project_name=None)

Apply a routing change to the transit network, including stop updates.

Parameters:

  • net (TransitNetwork) –

    TransitNetwork object to apply routing change to.

  • selection (Selection) –

    TransitSelection object, created from a selection dictionary.

  • routing_change (dict) –

    Routing Change dictionary, e.g.

    {
        "existing": [46665, 150855],
        "set": [-46665, 150855, 46665, 150855],
    }
    

  • shape_id_scalar (int) –

    Initial scalar value to add to duplicated shape_ids to create a new shape_id. Defaults to SHAPE_ID_SCALAR.

  • reference_road_net (RoadwayNetwork, default: None ) –

    Reference roadway network to use for updating shapes and stops. Defaults to None.

  • project_name (str, default: None ) –

    Name of the project. Defaults to None.

Source code in network_wrangler/transit/projects/edit_routing.py
def apply_transit_routing_change(
    net: TransitNetwork,
    selection: TransitSelection,
    routing_change: dict,
    reference_road_net: Optional[RoadwayNetwork] = None,
    project_name: Optional[str] = None,
) -> TransitNetwork:
    """Apply a routing change to the transit network, including stop updates.

    Args:
        net (TransitNetwork): TransitNetwork object to apply routing change to.
        selection (Selection): TransitSelection object, created from a selection dictionary.
        routing_change (dict): Routing Change dictionary, e.g.
            ```python
            {
                "existing": [46665, 150855],
                "set": [-46665, 150855, 46665, 150855],
            }
            ```
        shape_id_scalar (int, optional): Initial scalar value to add to duplicated shape_ids to
            create a new shape_id. Defaults to SHAPE_ID_SCALAR.
        reference_road_net (RoadwayNetwork, optional): Reference roadway network to use for
            updating shapes and stops. Defaults to None.
        project_name (str, optional): Name of the project. Defaults to None.
    """
    WranglerLogger.debug("Applying transit routing change project.")
    WranglerLogger.debug(f"...selection: {selection.selection_dict}")
    WranglerLogger.debug(f"...routing: {routing_change}")

    # ---- Secure all inputs needed --------------
    updated_feed = copy.deepcopy(net.feed)
    trip_ids = selection.selected_trips
    if project_name:
        updated_feed.trips.loc[updated_feed.trips.trip_id.isin(trip_ids), "projects"] += (
            f"{project_name},"
        )

    road_net = net.road_net if reference_road_net is None else reference_road_net
    if road_net is None:
        WranglerLogger.error(
            "! Must have a reference road network set in order to update transit \
                         routin.  Either provide as an input to this function or set it for the \
                         transit network: >> transit_net.road_net = ..."
        )
        msg = "Must have a reference road network set in order to update transit routing."
        raise TransitRoutingChangeError(msg)

    # ---- update each shape that is used by selected trips to use new routing -------
    shape_ids = shape_ids_for_trip_ids(updated_feed.trips, trip_ids)
    # WranglerLogger.debug(f"shape_ids: {shape_ids}")
    for shape_id in shape_ids:
        updated_feed.shapes, updated_feed.trips = _update_shapes_and_trips(
            updated_feed,
            shape_id,
            trip_ids,
            routing_change["set"],
            net.config.IDS.TRANSIT_SHAPE_ID_SCALAR,
            road_net,
            routing_existing=routing_change.get("existing", []),
            project_name=project_name,
        )
    # WranglerLogger.debug(f"updated_feed.shapes: \n{updated_feed.shapes}")
    # WranglerLogger.debug(f"updated_feed.trips: \n{updated_feed.trips}")
    # ---- Check if any stops need adding to stops.txt and add if they do ----------
    updated_feed.stops = _update_stops(
        updated_feed, routing_change["set"], road_net, project_name=project_name
    )
    # WranglerLogger.debug(f"updated_feed.stops: \n{updated_feed.stops}")
    # ---- Update stop_times --------------------------------------------------------
    for trip_id in trip_ids:
        updated_feed.stop_times = _update_stop_times_for_trip(
            updated_feed,
            trip_id,
            routing_change["set"],
            routing_change.get("existing", []),
        )

    # ---- Check result -------------------------------------------------------------
    _show_col = [
        "trip_id",
        "stop_id",
        "stop_sequence",
        "departure_time",
        "arrival_time",
    ]
    _ex_stoptimes = updated_feed.stop_times.loc[
        updated_feed.stop_times.trip_id == trip_ids[0], _show_col
    ]
    # WranglerLogger.debug(f"stop_times for first updated trip: \n {_ex_stoptimes}")

    # ---- Update transit network with updated feed.
    net.feed = updated_feed
    # WranglerLogger.debug(f"net.feed.stops: \n {net.feed.stops}")
    return net

Transit Helper Modules

Functions to clip a TransitNetwork object to a boundary.

Clipped transit is an independent transit network that is a subset of the original transit network.

Example usage:

from network_wrangler.transit load_transit, write_transit
from network_wrangler.transit.clip import clip_transit

stpaul_transit = load_transit(example_dir / "stpaul")
boundary_file = test_dir / "data" / "ecolab.geojson"
clipped_network = clip_transit(stpaul_transit, boundary_file=boundary_file)
write_transit(clipped_network, out_dir, prefix="ecolab", format="geojson", true_shape=True)

network_wrangler.transit.clip.clip_feed_to_boundary

clip_feed_to_boundary(feed, ref_nodes_df, boundary_gdf=None, boundary_geocode=None, boundary_file=None, min_stops=DEFAULT_MIN_STOPS)

Clips a transit Feed object to a boundary and returns the resulting GeoDataFrames.

Retains only the stops within the boundary and trips that traverse them subject to a minimum number of stops per trip as defined by min_stops.

Parameters:

  • feed (Feed) –

    Feed object to be clipped.

  • ref_nodes_df (GeoDataFrame) –

    geodataframe with node geometry to reference

  • boundary_geocode (Union[str, dict], default: None ) –

    A geocode string or dictionary representing the boundary. Defaults to None.

  • boundary_file (Union[str, Path], default: None ) –

    A path to the boundary file. Only used if boundary_geocode is None. Defaults to None.

  • boundary_gdf (GeoDataFrame, default: None ) –

    A GeoDataFrame representing the boundary. Only used if boundary_geocode and boundary_file are None. Defaults to None.

  • min_stops (int, default: DEFAULT_MIN_STOPS ) –

    minimum number of stops needed to retain a transit trip within clipped area. Defaults to DEFAULT_MIN_STOPS which is set to 2.

Source code in network_wrangler/transit/clip.py
def clip_feed_to_boundary(
    feed: Feed,
    ref_nodes_df: gpd.GeoDataFrame,
    boundary_gdf: Optional[gpd.GeoDataFrame] = None,
    boundary_geocode: Optional[Union[str, dict]] = None,
    boundary_file: Optional[Union[str, Path]] = None,
    min_stops: int = DEFAULT_MIN_STOPS,
) -> Feed:
    """Clips a transit Feed object to a boundary and returns the resulting GeoDataFrames.

    Retains only the stops within the boundary and trips that traverse them subject to a minimum
    number of stops per trip as defined by `min_stops`.

    Args:
        feed: Feed object to be clipped.
        ref_nodes_df: geodataframe with node geometry to reference
        boundary_geocode (Union[str, dict], optional): A geocode string or dictionary
            representing the boundary. Defaults to None.
        boundary_file (Union[str, Path], optional): A path to the boundary file. Only used if
            boundary_geocode is None. Defaults to None.
        boundary_gdf (gpd.GeoDataFrame, optional): A GeoDataFrame representing the boundary.
            Only used if boundary_geocode and boundary_file are None. Defaults to None.
        min_stops: minimum number of stops needed to retain a transit trip within clipped area.
            Defaults to DEFAULT_MIN_STOPS which is set to 2.

    Returns: Feed object trimmed to the boundary.
    """
    WranglerLogger.info("Clipping transit network to boundary.")

    boundary_gdf = get_bounding_polygon(
        boundary_gdf=boundary_gdf,
        boundary_geocode=boundary_geocode,
        boundary_file=boundary_file,
    )

    shape_links_gdf = shapes_to_shape_links_gdf(feed.shapes, ref_nodes_df=ref_nodes_df)

    # make sure boundary_gdf.crs == network.crs
    if boundary_gdf.crs != shape_links_gdf.crs:
        boundary_gdf = boundary_gdf.to_crs(shape_links_gdf.crs)

    # get the boundary as a single polygon
    boundary = boundary_gdf.geometry.union_all()
    # get the shape_links that intersect the boundary
    clipped_shape_links = shape_links_gdf[shape_links_gdf.geometry.intersects(boundary)]

    # nodes within clipped_shape_links
    node_ids = list(set(clipped_shape_links.A.to_list() + clipped_shape_links.B.to_list()))
    WranglerLogger.debug(f"Clipping network to {len(node_ids)} nodes.")
    if not node_ids:
        msg = "No nodes found within the boundary."
        raise ValueError(msg)
    return _clip_feed_to_nodes(feed, node_ids, min_stops=min_stops)

network_wrangler.transit.clip.clip_feed_to_roadway

clip_feed_to_roadway(feed, roadway_net, min_stops=DEFAULT_MIN_STOPS)

Returns a copy of transit feed clipped to the roadway network.

Parameters:

  • feed (Feed) –

    Transit Feed to clip.

  • roadway_net (RoadwayNetwork) –

    Roadway network to clip to.

  • min_stops (int, default: DEFAULT_MIN_STOPS ) –

    minimum number of stops needed to retain a transit trip within clipped area. Defaults to DEFAULT_MIN_STOPS which is set to 2.

Raises:

  • ValueError

    If no stops found within the roadway network.

Returns:

  • Feed ( Feed ) –

    Clipped deep copy of feed limited to the roadway network.

Source code in network_wrangler/transit/clip.py
def clip_feed_to_roadway(
    feed: Feed,
    roadway_net: RoadwayNetwork,
    min_stops: int = DEFAULT_MIN_STOPS,
) -> Feed:
    """Returns a copy of transit feed clipped to the roadway network.

    Args:
        feed (Feed): Transit Feed to clip.
        roadway_net: Roadway network to clip to.
        min_stops: minimum number of stops needed to retain a transit trip within clipped area.
            Defaults to DEFAULT_MIN_STOPS which is set to 2.

    Raises:
        ValueError: If no stops found within the roadway network.

    Returns:
        Feed: Clipped deep copy of feed limited to the roadway network.
    """
    WranglerLogger.info("Clipping transit network to roadway network.")

    clipped_feed = _remove_links_from_feed(feed, roadway_net.links_df, min_stops=min_stops)

    return clipped_feed

network_wrangler.transit.clip.clip_transit

clip_transit(network, node_ids=None, boundary_geocode=None, boundary_file=None, boundary_gdf=None, ref_nodes_df=None, roadway_net=None, min_stops=DEFAULT_MIN_STOPS)

Returns a new TransitNetwork clipped to a boundary as determined by arguments.

Will clip based on which arguments are provided as prioritized below:

  1. If node_ids provided, will clip based on node_ids
  2. If boundary_geocode provided, will clip based on on search in OSM for that jurisdiction boundary using reference geometry from ref_nodes_df, roadway_net, or roadway_path
  3. If boundary_file provided, will clip based on that polygon using reference geometry from ref_nodes_df, roadway_net, or roadway_path
  4. If boundary_gdf provided, will clip based on that geodataframe using reference geometry from ref_nodes_df, roadway_net, or roadway_path
  5. If roadway_net provided, will clip based on that roadway network

Parameters:

  • network (TransitNetwork) –

    TransitNetwork to clip.

  • node_ids (list[str], default: None ) –

    A list of node_ids to clip to. Defaults to None.

  • boundary_geocode (Union[str, dict], default: None ) –

    A geocode string or dictionary representing the boundary. Only used if node_ids are None. Defaults to None.

  • boundary_file (Union[str, Path], default: None ) –

    A path to the boundary file. Only used if node_ids and boundary_geocode are None. Defaults to None.

  • boundary_gdf (GeoDataFrame, default: None ) –

    A GeoDataFrame representing the boundary. Only used if node_ids, boundary_geocode and boundary_file are None. Defaults to None.

  • ref_nodes_df (Optional[Union[None, GeoDataFrame]], default: None ) –

    GeoDataFrame of geographic references for node_ids. Only used if node_ids is None and one of boundary_* is not None.

  • roadway_net (Optional[Union[None, RoadwayNetwork]], default: None ) –

    Roadway Network instance to clip transit network to. Only used if node_ids is None and allof boundary_* are None

  • min_stops (int, default: DEFAULT_MIN_STOPS ) –

    minimum number of stops needed to retain a transit trip within clipped area. Defaults to DEFAULT_MIN_STOPS which is set to 2.

Source code in network_wrangler/transit/clip.py
def clip_transit(
    network: Union[TransitNetwork, str, Path],
    node_ids: Optional[Union[None, list[str]]] = None,
    boundary_geocode: Optional[Union[str, dict, None]] = None,
    boundary_file: Optional[Union[str, Path]] = None,
    boundary_gdf: Optional[Union[None, gpd.GeoDataFrame]] = None,
    ref_nodes_df: Optional[Union[None, gpd.GeoDataFrame]] = None,
    roadway_net: Optional[Union[None, RoadwayNetwork]] = None,
    min_stops: int = DEFAULT_MIN_STOPS,
) -> TransitNetwork:
    """Returns a new TransitNetwork clipped to a boundary as determined by arguments.

    Will clip based on which arguments are provided as prioritized below:

    1. If `node_ids` provided, will clip based on `node_ids`
    2. If `boundary_geocode` provided, will clip based on on search in OSM for that jurisdiction
        boundary using reference geometry from `ref_nodes_df`, `roadway_net`, or `roadway_path`
    3. If `boundary_file` provided, will clip based on that polygon  using reference geometry
        from `ref_nodes_df`, `roadway_net`, or `roadway_path`
    4. If `boundary_gdf` provided, will clip based on that geodataframe  using reference geometry
        from `ref_nodes_df`, `roadway_net`, or `roadway_path`
    5. If `roadway_net` provided, will clip based on that roadway network

    Args:
        network (TransitNetwork): TransitNetwork to clip.
        node_ids (list[str], optional): A list of node_ids to clip to. Defaults to None.
        boundary_geocode (Union[str, dict], optional): A geocode string or dictionary
            representing the boundary. Only used if node_ids are None. Defaults to None.
        boundary_file (Union[str, Path], optional): A path to the boundary file. Only used if
            node_ids and boundary_geocode are None. Defaults to None.
        boundary_gdf (gpd.GeoDataFrame, optional): A GeoDataFrame representing the boundary.
            Only used if node_ids, boundary_geocode and boundary_file are None. Defaults to None.
        ref_nodes_df: GeoDataFrame of geographic references for node_ids.  Only used if
            node_ids is None and one of boundary_* is not None.
        roadway_net: Roadway Network  instance to clip transit network to.  Only used if
            node_ids is None and allof boundary_* are None
        min_stops: minimum number of stops needed to retain a transit trip within clipped area.
            Defaults to DEFAULT_MIN_STOPS which is set to 2.
    """
    if not isinstance(network, TransitNetwork):
        network = load_transit(network)
    set_roadway_network = False
    feed = network.feed

    if node_ids is not None:
        clipped_feed = _clip_feed_to_nodes(feed, node_ids=node_ids, min_stops=min_stops)
    elif any(i is not None for i in [boundary_file, boundary_geocode, boundary_gdf]):
        if ref_nodes_df is None:
            ref_nodes_df = get_nodes(transit_net=network, roadway_net=roadway_net)

        clipped_feed = clip_feed_to_boundary(
            feed,
            ref_nodes_df,
            boundary_file=boundary_file,
            boundary_geocode=boundary_geocode,
            boundary_gdf=boundary_gdf,
            min_stops=min_stops,
        )
    elif roadway_net is not None:
        clipped_feed = clip_feed_to_roadway(feed, roadway_net=roadway_net)
        set_roadway_network = True
    else:
        msg = "Missing required arguments from clip_transit"
        raise ValueError(msg)

    # create a new TransitNetwork object with the clipped feed dataframes
    clipped_net = TransitNetwork(clipped_feed)

    if set_roadway_network:
        WranglerLogger.info("Setting roadway network for clipped transit network.")
        clipped_net.road_net = roadway_net
    return clipped_net

Utilities for working with transit geodataframes.

shapes_to_shape_links_gdf(shapes, ref_nodes_df=None, from_field='A', to_field='B', crs=LAT_LON_CRS)

Translates shapes to shape links geodataframe using geometry from ref_nodes_df if provided.

TODO: Add join to links and then shapes to get true geometry.

Parameters:

  • shapes (DataFrame[WranglerShapesTable]) –

    Feed shapes table

  • ref_nodes_df (Optional[DataFrame[RoadNodesTable]], default: None ) –

    If specified, will use geometry from these nodes. Otherwise, will use geometry in shapes file. Defaults to None.

  • from_field (str, default: 'A' ) –

    Field used for the link’s from node model_node_id. Defaults to “A”.

  • to_field (str, default: 'B' ) –

    Field used for the link’s to node model_node_id. Defaults to “B”.

  • crs (int, default: LAT_LON_CRS ) –

    Coordinate reference system. SHouldn’t be changed unless you know what you are doing. Defaults to LAT_LON_CRS which is WGS84 lat/long.

Returns:

  • GeoDataFrame

    gpd.GeoDataFrame: description

Source code in network_wrangler/transit/geo.py
def shapes_to_shape_links_gdf(
    shapes: DataFrame[WranglerShapesTable],
    ref_nodes_df: Optional[DataFrame[RoadNodesTable]] = None,
    from_field: str = "A",
    to_field: str = "B",
    crs: int = LAT_LON_CRS,
) -> gpd.GeoDataFrame:
    """Translates shapes to shape links geodataframe using geometry from ref_nodes_df if provided.

    TODO: Add join to links and then shapes to get true geometry.

    Args:
        shapes: Feed shapes table
        ref_nodes_df: If specified, will use geometry from these nodes.  Otherwise, will use
            geometry in shapes file. Defaults to None.
        from_field: Field used for the link's from node `model_node_id`. Defaults to "A".
        to_field: Field used for the link's to node `model_node_id`. Defaults to "B".
        crs (int, optional): Coordinate reference system. SHouldn't be changed unless you know
            what you are doing. Defaults to LAT_LON_CRS which is WGS84 lat/long.

    Returns:
        gpd.GeoDataFrame: _description_
    """
    if ref_nodes_df is not None:
        shapes = update_shapes_geometry(shapes, ref_nodes_df)
    tr_links = unique_shape_links(shapes, from_field=from_field, to_field=to_field)
    # WranglerLogger.debug(f"tr_links :\n{tr_links }")

    geometry = linestring_from_lats_lons(
        tr_links,
        [f"shape_pt_lat_{from_field}", f"shape_pt_lat_{to_field}"],
        [f"shape_pt_lon_{from_field}", f"shape_pt_lon_{to_field}"],
    )
    # WranglerLogger.debug(f"geometry\n{geometry}")
    shapes_gdf = gpd.GeoDataFrame(tr_links, geometry=geometry, crs=crs).set_crs(LAT_LON_CRS)
    return shapes_gdf

network_wrangler.transit.geo.shapes_to_trip_shapes_gdf

shapes_to_trip_shapes_gdf(shapes, ref_nodes_df=None, crs=LAT_LON_CRS)

Geodataframe with one polyline shape per shape_id.

TODO: add information about the route and trips.

Parameters:

  • shapes (DataFrame[WranglerShapesTable]) –

    WranglerShapesTable

  • trips

    WranglerTripsTable

  • ref_nodes_df (Optional[DataFrame[RoadNodesTable]], default: None ) –

    If specified, will use geometry from these nodes. Otherwise, will use geometry in shapes file. Defaults to None.

  • crs (int, default: LAT_LON_CRS ) –

    int, optional, default 4326

Source code in network_wrangler/transit/geo.py
def shapes_to_trip_shapes_gdf(
    shapes: DataFrame[WranglerShapesTable],
    # trips: WranglerTripsTable,
    ref_nodes_df: Optional[DataFrame[RoadNodesTable]] = None,
    crs: int = LAT_LON_CRS,
) -> gpd.GeoDataFrame:
    """Geodataframe with one polyline shape per shape_id.

    TODO: add information about the route and trips.

    Args:
        shapes: WranglerShapesTable
        trips: WranglerTripsTable
        ref_nodes_df: If specified, will use geometry from these nodes.  Otherwise, will use
            geometry in shapes file. Defaults to None.
        crs: int, optional, default 4326
    """
    if ref_nodes_df is not None:
        shapes = update_shapes_geometry(shapes, ref_nodes_df)

    shape_geom = (
        shapes[["shape_id", "shape_pt_lat", "shape_pt_lon"]]
        .groupby("shape_id")
        .agg(list)
        .apply(lambda x: LineString(zip(x[1], x[0])), axis=1)
    )

    route_shapes_gdf = gpd.GeoDataFrame(
        data=shape_geom.index, geometry=shape_geom.values, crs=crs
    ).set_crs(LAT_LON_CRS)

    return route_shapes_gdf
stop_times_to_stop_time_links_gdf(stop_times, stops, ref_nodes_df=None, from_field='A', to_field='B')

Stop times geodataframe as links using geometry from stops.txt or optionally another df.

Parameters:

  • stop_times (WranglerStopTimesTable) –

    Feed stop times table.

  • stops (WranglerStopsTable) –

    Feed stops table.

  • ref_nodes_df (DataFrame, default: None ) –

    If specified, will use geometry from these nodes. Otherwise, will use geometry in shapes file. Defaults to None.

  • from_field (str, default: 'A' ) –

    Field used for the link’s from node model_node_id. Defaults to “A”.

  • to_field (str, default: 'B' ) –

    Field used for the link’s to node model_node_id. Defaults to “B”.

Source code in network_wrangler/transit/geo.py
def stop_times_to_stop_time_links_gdf(
    stop_times: DataFrame[WranglerStopTimesTable],
    stops: DataFrame[WranglerStopsTable],
    ref_nodes_df: Optional[DataFrame[RoadNodesTable]] = None,
    from_field: str = "A",
    to_field: str = "B",
) -> gpd.GeoDataFrame:
    """Stop times geodataframe as links using geometry from stops.txt or optionally another df.

    Args:
        stop_times (WranglerStopTimesTable): Feed stop times table.
        stops (WranglerStopsTable): Feed stops table.
        ref_nodes_df (pd.DataFrame, optional): If specified, will use geometry from these nodes.
            Otherwise, will use geometry in shapes file. Defaults to None.
        from_field: Field used for the link's from node `model_node_id`. Defaults to "A".
        to_field: Field used for the link's to node `model_node_id`. Defaults to "B".
    """
    from ..utils.geo import linestring_from_lats_lons  # noqa: PLC0415

    if ref_nodes_df is not None:
        stops = update_stops_geometry(stops, ref_nodes_df)

    lat_fields = []
    lon_fields = []
    tr_links = unique_stop_time_links(stop_times, from_field=from_field, to_field=to_field)
    for f in (from_field, to_field):
        tr_links = tr_links.merge(
            stops[["stop_id", "stop_lat", "stop_lon"]],
            right_on="stop_id",
            left_on=f,
            how="left",
        )
        lon_f = f"{f}_X"
        lat_f = f"{f}_Y"
        tr_links = tr_links.rename(columns={"stop_lon": lon_f, "stop_lat": lat_f})
        lon_fields.append(lon_f)
        lat_fields.append(lat_f)

    geometry = linestring_from_lats_lons(tr_links, lat_fields, lon_fields)
    return gpd.GeoDataFrame(tr_links, geometry=geometry).set_crs(LAT_LON_CRS)

network_wrangler.transit.geo.stop_times_to_stop_time_points_gdf

stop_times_to_stop_time_points_gdf(stop_times, stops, ref_nodes_df=None)

Stoptimes geodataframe as points using geometry from stops.txt or optionally another df.

Parameters:

  • stop_times (WranglerStopTimesTable) –

    Feed stop times table.

  • stops (WranglerStopsTable) –

    Feed stops table.

  • ref_nodes_df (DataFrame, default: None ) –

    If specified, will use geometry from these nodes. Otherwise, will use geometry in shapes file. Defaults to None.

Source code in network_wrangler/transit/geo.py
def stop_times_to_stop_time_points_gdf(
    stop_times: DataFrame[WranglerStopTimesTable],
    stops: DataFrame[WranglerStopsTable],
    ref_nodes_df: Optional[DataFrame[RoadNodesTable]] = None,
) -> gpd.GeoDataFrame:
    """Stoptimes geodataframe as points using geometry from stops.txt or optionally another df.

    Args:
        stop_times (WranglerStopTimesTable): Feed stop times table.
        stops (WranglerStopsTable): Feed stops table.
        ref_nodes_df (pd.DataFrame, optional): If specified, will use geometry from these nodes.
            Otherwise, will use geometry in shapes file. Defaults to None.
    """
    if ref_nodes_df is not None:
        stops = update_stops_geometry(stops, ref_nodes_df)

    stop_times_geo = stop_times.merge(
        stops[["stop_id", "stop_lat", "stop_lon"]],
        right_on="stop_id",
        left_on="stop_id",
        how="left",
    )
    return gpd.GeoDataFrame(
        stop_times_geo,
        geometry=gpd.points_from_xy(stop_times_geo["stop_lon"], stop_times_geo["stop_lat"]),
        crs=LAT_LON_CRS,
    )

network_wrangler.transit.geo.update_shapes_geometry

update_shapes_geometry(shapes, ref_nodes_df)

Returns shapes table with geometry updated from ref_nodes_df.

NOTE: does not update “geometry” field if it exists.

Source code in network_wrangler/transit/geo.py
def update_shapes_geometry(
    shapes: DataFrame[WranglerShapesTable], ref_nodes_df: DataFrame[RoadNodesTable]
) -> DataFrame[WranglerShapesTable]:
    """Returns shapes table with geometry updated from ref_nodes_df.

    NOTE: does not update "geometry" field if it exists.
    """
    return update_point_geometry(
        shapes,
        ref_nodes_df,
        id_field="shape_model_node_id",
        lon_field="shape_pt_lon",
        lat_field="shape_pt_lat",
    )

network_wrangler.transit.geo.update_stops_geometry

update_stops_geometry(stops, ref_nodes_df)

Returns stops table with geometry updated from ref_nodes_df.

NOTE: does not update “geometry” field if it exists.

Source code in network_wrangler/transit/geo.py
def update_stops_geometry(
    stops: DataFrame[WranglerStopsTable], ref_nodes_df: DataFrame[RoadNodesTable]
) -> DataFrame[WranglerStopsTable]:
    """Returns stops table with geometry updated from ref_nodes_df.

    NOTE: does not update "geometry" field if it exists.
    """
    return update_point_geometry(
        stops, ref_nodes_df, id_field="stop_id", lon_field="stop_lon", lat_field="stop_lat"
    )

Functions for reading and writing transit feeds and networks.

network_wrangler.transit.io.convert_transit_serialization

convert_transit_serialization(input_path, output_format, out_dir='.', input_file_format='csv', out_prefix='', overwrite=True)

Converts a transit network from one serialization to another.

Parameters:

  • input_path (Union[str, Path]) –

    path to the input network

  • output_format (TransitFileTypes) –

    the format of the output files. Should be txt, csv, or parquet.

  • out_dir (Union[Path, str], default: '.' ) –

    directory to write the network to. Defaults to current directory.

  • input_file_format (TransitFileTypes, default: 'csv' ) –

    the file_format of the files to read. Should be txt, csv, or parquet. Defaults to “txt”

  • out_prefix (str, default: '' ) –

    prefix to add to the file name. Defaults to “”

  • overwrite (bool, default: True ) –

    if True, will overwrite the files if they already exist. Defaults to True

Source code in network_wrangler/transit/io.py
def convert_transit_serialization(
    input_path: Union[str, Path],
    output_format: TransitFileTypes,
    out_dir: Union[Path, str] = ".",
    input_file_format: TransitFileTypes = "csv",
    out_prefix: str = "",
    overwrite: bool = True,
):
    """Converts a transit network from one serialization to another.

    Args:
        input_path: path to the input network
        output_format: the format of the output files. Should be txt, csv, or parquet.
        out_dir: directory to write the network to. Defaults to current directory.
        input_file_format: the file_format of the files to read. Should be txt, csv, or parquet.
            Defaults to "txt"
        out_prefix: prefix to add to the file name. Defaults to ""
        overwrite: if True, will overwrite the files if they already exist. Defaults to True
    """
    WranglerLogger.info(
        f"Loading transit net from {input_path} with input type {input_file_format}"
    )
    net = load_transit(input_path, file_format=input_file_format)
    WranglerLogger.info(f"Writing transit network to {out_dir} in {output_format} format.")
    write_transit(
        net,
        prefix=out_prefix,
        out_dir=out_dir,
        file_format=output_format,
        overwrite=overwrite,
    )

network_wrangler.transit.io.load_feed_from_dfs

load_feed_from_dfs(feed_dfs, wrangler_flavored=True)

Create a Feed or GtfsModel object from a dictionary of DataFrames representing a GTFS feed.

Parameters:

  • feed_dfs (dict) –

    A dictionary containing DataFrames representing the tables of a GTFS feed.

  • wrangler_flavored (bool, default: True ) –

    If True, creates a Wrangler-enhanced Feed] object. If False, creates a pure GtfsModel object. Defaults to True.

Returns:

  • Union[Feed, GtfsModel]

    Union[Feed, GtfsModel]: A Feed or GtfsModel object representing the transit network.

Raises:

  • ValueError

    If the feed_dfs dictionary does not contain all the required tables.

Example usage:

feed_dfs = {
    "agency": agency_df,
    "routes": routes_df,
    "stops": stops_df,
    "trips": trips_df,
    "stop_times": stop_times_df,
}
feed = load_feed_from_dfs(feed_dfs)  # Creates Feed by default
gtfs_model = load_feed_from_dfs(feed_dfs, wrangler_flavored=False)  # Creates GtfsModel

Source code in network_wrangler/transit/io.py
def load_feed_from_dfs(feed_dfs: dict, wrangler_flavored: bool = True) -> Union[Feed, GtfsModel]:
    """Create a Feed or GtfsModel object from a dictionary of DataFrames representing a GTFS feed.

    Args:
        feed_dfs (dict): A dictionary containing DataFrames representing the tables of a GTFS feed.
        wrangler_flavored: If True, creates a Wrangler-enhanced Feed] object.
                           If False, creates a pure GtfsModel object. Defaults to True.

    Returns:
        Union[Feed, GtfsModel]: A Feed or GtfsModel object representing the transit network.

    Raises:
        ValueError: If the feed_dfs dictionary does not contain all the required tables.

    Example usage:
    ```python
    feed_dfs = {
        "agency": agency_df,
        "routes": routes_df,
        "stops": stops_df,
        "trips": trips_df,
        "stop_times": stop_times_df,
    }
    feed = load_feed_from_dfs(feed_dfs)  # Creates Feed by default
    gtfs_model = load_feed_from_dfs(feed_dfs, wrangler_flavored=False)  # Creates GtfsModel
    ```
    """
    # Use the appropriate model class based on the parameter
    model_class = Feed if wrangler_flavored else GtfsModel

    if not all(table in feed_dfs for table in model_class.table_names):
        model_name = "Feed" if wrangler_flavored else "GtfsModel"
        msg = f"feed_dfs must contain the following tables for {model_name}: {model_class.table_names}"
        raise ValueError(msg)

    feed = model_class(**feed_dfs)

    return feed

network_wrangler.transit.io.load_feed_from_path

load_feed_from_path(feed_path, file_format='txt', wrangler_flavored=True)

Create a Feed or GtfsModel object from the path to a GTFS transit feed.

Parameters:

  • feed_path (Union[Path, str]) –

    The path to the GTFS transit feed.

  • file_format (TransitFileTypes, default: 'txt' ) –

    the format of the files to read. Defaults to “txt”

  • wrangler_flavored (bool, default: True ) –

    If True, creates a Wrangler-enhanced Feed object. If False, creates a pure GtfsModel object. Defaults to True.

Returns:

  • Union[Feed, GtfsModel]

    Union[Feed, GtfsModel]: The Feed or GtfsModel object created from the GTFS transit feed.

Source code in network_wrangler/transit/io.py
def load_feed_from_path(
    feed_path: Union[Path, str],
    file_format: TransitFileTypes = "txt",
    wrangler_flavored: bool = True,
) -> Union[Feed, GtfsModel]:
    """Create a Feed or GtfsModel object from the path to a GTFS transit feed.

    Args:
        feed_path (Union[Path, str]): The path to the GTFS transit feed.
        file_format: the format of the files to read. Defaults to "txt"
        wrangler_flavored: If True, creates a Wrangler-enhanced Feed object.
                          If False, creates a pure GtfsModel object. Defaults to True.

    Returns:
        Union[Feed, GtfsModel]: The Feed or GtfsModel object created from the GTFS transit feed.
    """
    feed_path = _feed_path_ref(Path(feed_path))  # unzips if needs to be unzipped

    if not feed_path.is_dir():
        msg = f"Feed path not a directory: {feed_path}"
        raise NotADirectoryError(msg)

    WranglerLogger.info(f"Reading GTFS feed tables from {feed_path}")

    # Use the appropriate table names based on the model type
    model_class = Feed if wrangler_flavored else GtfsModel
    feed_possible_files = {
        table: list(feed_path.glob(f"*{table}.{file_format}")) for table in model_class.table_names
    }
    WranglerLogger.debug(f"model_class={model_class}  feed_possible_files={feed_possible_files}")

    # make sure we have all the tables we need
    _missing_files = [t for t, v in feed_possible_files.items() if not v]

    if _missing_files:
        WranglerLogger.debug(f"!!! Missing transit files: {_missing_files}")
        model_name = "Feed" if wrangler_flavored else "GtfsModel"
        msg = f"Required GTFS {model_name} table(s) not in {feed_path}: \n  {_missing_files}"
        raise RequiredTableError(msg)

    # but don't want to have more than one file per search
    _ambiguous_files = [t for t, v in feed_possible_files.items() if len(v) > 1]
    if _ambiguous_files:
        WranglerLogger.warning(
            f"! More than one file matches following tables. \
                               Using the first on the list: {_ambiguous_files}"
        )

    feed_files = {t: f[0] for t, f in feed_possible_files.items()}
    feed_dfs = {table: _read_table_from_file(table, file) for table, file in feed_files.items()}

    return load_feed_from_dfs(feed_dfs, wrangler_flavored=wrangler_flavored)

network_wrangler.transit.io.load_transit

load_transit(feed, file_format='txt', config=DefaultConfig)

Create a TransitNetwork object.

This function takes in a feed parameter, which can be one of the following types:

  • Feed: A Feed object representing a transit feed.
  • dict[str, pd.DataFrame]: A dictionary of DataFrames representing transit data.
  • str or Path: A string or a Path object representing the path to a transit feed file.

Parameters:

  • feed (Union[Feed, GtfsModel, dict[str, DataFrame], str, Path]) –

    Feed boject, dict of transit data frames, or path to transit feed data

  • file_format (TransitFileTypes, default: 'txt' ) –

    the format of the files to read. Defaults to “txt”

  • config (WranglerConfig, default: DefaultConfig ) –

    WranglerConfig object. Defaults to DefaultConfig.

Returns:

Example usage:

transit_network_from_zip = load_transit("path/to/gtfs.zip")

transit_network_from_unzipped_dir = load_transit("path/to/files")

transit_network_from_parquet = load_transit("path/to/files", file_format="parquet")

dfs_of_transit_data = {"routes": routes_df, "stops": stops_df, "trips": trips_df...}
transit_network_from_dfs = load_transit(dfs_of_transit_data)

Source code in network_wrangler/transit/io.py
def load_transit(
    feed: Union[Feed, GtfsModel, dict[str, pd.DataFrame], str, Path],
    file_format: TransitFileTypes = "txt",
    config: WranglerConfig = DefaultConfig,
) -> TransitNetwork:
    """Create a [`TransitNetwork`][network_wrangler.transit.network.TransitNetwork] object.

    This function takes in a `feed` parameter, which can be one of the following types:

    - `Feed`: A Feed object representing a transit feed.
    - `dict[str, pd.DataFrame]`: A dictionary of DataFrames representing transit data.
    - `str` or `Path`: A string or a Path object representing the path to a transit feed file.

    Args:
        feed: Feed boject, dict of transit data frames, or path to transit feed data
        file_format: the format of the files to read. Defaults to "txt"
        config: WranglerConfig object. Defaults to DefaultConfig.

    Returns:
        (TransitNetwork): object representing the loaded transit network.

    Raises:
    ValueError: If the `feed` parameter is not one of the supported types.

    Example usage:
    ```python
    transit_network_from_zip = load_transit("path/to/gtfs.zip")

    transit_network_from_unzipped_dir = load_transit("path/to/files")

    transit_network_from_parquet = load_transit("path/to/files", file_format="parquet")

    dfs_of_transit_data = {"routes": routes_df, "stops": stops_df, "trips": trips_df...}
    transit_network_from_dfs = load_transit(dfs_of_transit_data)
    ```

    """
    if isinstance(feed, (Path, str)):
        feed = Path(feed)
        feed_obj = load_feed_from_path(feed, file_format=file_format)
        feed_obj.feed_path = feed
    elif isinstance(feed, dict):
        feed_obj = load_feed_from_dfs(feed)
    elif isinstance(feed, GtfsModel):
        feed_obj = Feed(**feed.__dict__)
    else:
        if not isinstance(feed, Feed):
            msg = f"TransitNetwork must be seeded with a Feed, dict of dfs or Path. Found {type(feed)}"
            raise ValueError(msg)
        feed_obj = feed

    return TransitNetwork(feed_obj, config=config)

network_wrangler.transit.io.write_feed_geo

write_feed_geo(feed, ref_nodes_df, out_dir, file_format='geojson', out_prefix=None, overwrite=True)

Write a Feed object to a directory in a geospatial format.

Parameters:

  • feed (Feed) –

    Feed object to write

  • ref_nodes_df (GeoDataFrame) –

    Reference nodes dataframe to use for geometry

  • out_dir (Union[str, Path]) –

    directory to write the network to

  • file_format (Literal['geojson', 'shp', 'parquet'], default: 'geojson' ) –

    the format of the output files. Defaults to “geojson”

  • out_prefix

    prefix to add to the file name

  • overwrite (bool, default: True ) –

    if True, will overwrite the files if they already exist. Defaults to True

Source code in network_wrangler/transit/io.py
def write_feed_geo(
    feed: Feed,
    ref_nodes_df: gpd.GeoDataFrame,
    out_dir: Union[str, Path],
    file_format: Literal["geojson", "shp", "parquet"] = "geojson",
    out_prefix=None,
    overwrite: bool = True,
) -> None:
    """Write a Feed object to a directory in a geospatial format.

    Args:
        feed: Feed object to write
        ref_nodes_df: Reference nodes dataframe to use for geometry
        out_dir: directory to write the network to
        file_format: the format of the output files. Defaults to "geojson"
        out_prefix: prefix to add to the file name
        overwrite: if True, will overwrite the files if they already exist. Defaults to True
    """
    from .geo import shapes_to_shape_links_gdf  # noqa: PLC0415

    out_dir = Path(out_dir)
    if not out_dir.is_dir():
        if out_dir.parent.is_dir():
            out_dir.mkdir()
        else:
            msg = f"Output directory {out_dir} ands its parent path does not exist"
            raise FileNotFoundError(msg)

    prefix = f"{out_prefix}_" if out_prefix else ""
    shapes_outpath = out_dir / f"{prefix}trn_shapes.{file_format}"
    shapes_gdf = shapes_to_shape_links_gdf(feed.shapes, ref_nodes_df=ref_nodes_df)
    write_table(shapes_gdf, shapes_outpath, overwrite=overwrite)

    stops_outpath = out_dir / f"{prefix}trn_stops.{file_format}"
    stops_gdf = to_points_gdf(feed.stops, ref_nodes_df=ref_nodes_df)
    write_table(stops_gdf, stops_outpath, overwrite=overwrite)

network_wrangler.transit.io.write_transit

write_transit(transit_net, out_dir='.', prefix=None, file_format='txt', overwrite=True)

Writes a network in the transit network standard.

Parameters:

  • transit_net

    a TransitNetwork instance

  • out_dir (Union[Path, str], default: '.' ) –

    directory to write the network to

  • file_format (Literal['txt', 'csv', 'parquet'], default: 'txt' ) –

    the format of the output files. Defaults to “txt” which is csv with txt file format.

  • prefix (Optional[Union[Path, str]], default: None ) –

    prefix to add to the file name

  • overwrite (bool, default: True ) –

    if True, will overwrite the files if they already exist. Defaults to True

Source code in network_wrangler/transit/io.py
def write_transit(
    transit_net,
    out_dir: Union[Path, str] = ".",
    prefix: Optional[Union[Path, str]] = None,
    file_format: Literal["txt", "csv", "parquet"] = "txt",
    overwrite: bool = True,
) -> None:
    """Writes a network in the transit network standard.

    Args:
        transit_net: a TransitNetwork instance
        out_dir: directory to write the network to
        file_format: the format of the output files. Defaults to "txt" which is csv with txt
            file format.
        prefix: prefix to add to the file name
        overwrite: if True, will overwrite the files if they already exist. Defaults to True
    """
    out_dir = Path(out_dir)
    prefix = f"{prefix}_" if prefix else ""
    for table in transit_net.feed.table_names:
        df = transit_net.feed.get_table(table)
        outpath = out_dir / f"{prefix}{table}.{file_format}"
        write_table(df, outpath, overwrite=overwrite)
    WranglerLogger.info(f"Wrote {len(transit_net.feed.tables)} files to {out_dir}")

ModelTransit class and functions for managing consistency between roadway and transit networks.

NOTE: this is not thoroughly tested and may not be fully functional.

network_wrangler.transit.model_transit.ModelTransit

ModelTransit class for managing consistency between roadway and transit networks.

Source code in network_wrangler/transit/model_transit.py
class ModelTransit:
    """ModelTransit class for managing consistency between roadway and transit networks."""

    def __init__(
        self,
        transit_net: TransitNetwork,
        roadway_net: RoadwayNetwork,
        shift_transit_to_managed_lanes: bool = True,
    ):
        """ModelTransit class for managing consistency between roadway and transit networks."""
        self.transit_net = transit_net
        self.roadway_net = roadway_net
        self._roadway_net_hash = None
        self._transit_feed_hash = None
        self._transit_shifted_to_ML = shift_transit_to_managed_lanes

    @property
    def model_roadway_net(self):
        """ModelRoadwayNetwork associated with this ModelTransit."""
        return self.roadway_net.model_net

    @property
    def consistent_nets(self) -> bool:
        """Indicate if roadway and transit networks have changed since self.m_feed updated."""
        return bool(
            self.roadway_net.network_hash == self._roadway_net_hash
            and self.transit_net.feed_hash == self._transit_feed_hash
        )

    @property
    def m_feed(self):
        """TransitNetwork.feed with updates for consistency with associated ModelRoadwayNetwork."""
        if self.consistent_nets:
            return self._m_feed
        # NOTE: look at this
        # If netoworks have changed, updated model transit and update reference hash
        self._roadway_net_hash = copy.deepcopy(self.roadway_net.network_hash)
        self._transit_feed_hash = copy.deepcopy(self.transit_net.feed_hash)

        if not self._transit_shifted_to_ML:
            self._m_feed = copy.deepcopy(self.transit_net.feed)
            return self._m_feed
        return None

network_wrangler.transit.model_transit.ModelTransit.consistent_nets property

consistent_nets

Indicate if roadway and transit networks have changed since self.m_feed updated.

network_wrangler.transit.model_transit.ModelTransit.m_feed property

m_feed

TransitNetwork.feed with updates for consistency with associated ModelRoadwayNetwork.

network_wrangler.transit.model_transit.ModelTransit.model_roadway_net property

model_roadway_net

ModelRoadwayNetwork associated with this ModelTransit.

network_wrangler.transit.model_transit.ModelTransit.__init__

__init__(transit_net, roadway_net, shift_transit_to_managed_lanes=True)

ModelTransit class for managing consistency between roadway and transit networks.

Source code in network_wrangler/transit/model_transit.py
def __init__(
    self,
    transit_net: TransitNetwork,
    roadway_net: RoadwayNetwork,
    shift_transit_to_managed_lanes: bool = True,
):
    """ModelTransit class for managing consistency between roadway and transit networks."""
    self.transit_net = transit_net
    self.roadway_net = roadway_net
    self._roadway_net_hash = None
    self._transit_feed_hash = None
    self._transit_shifted_to_ML = shift_transit_to_managed_lanes

Classes and functions for selecting transit trips from a transit network.

Usage:

Create a TransitSelection object by providing a TransitNetwork object and a selection dictionary:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
```python
selection_dict = {
    "links": {...},
    "nodes": {...},
    "route_properties": {...},
    "trip_properties": {...},
    "timespans": {...},
}
transit_selection = TransitSelection(transit_network, selection_dict)
```

Access the selected trip ids or dataframe as follows:

1
2
3
4
```python
selected_trips = transit_selection.selected_trips
selected_trips_df = transit_selection.selected_trips_df
```

Note: The selection dictionary should conform to the SelectTransitTrips model defined in the models.projects.transit_selection module.

network_wrangler.transit.selection.TransitSelection

Object to perform and store information about a selection from a project card “facility”.

Attributes:

Source code in network_wrangler/transit/selection.py
class TransitSelection:
    """Object to perform and store information about a selection from a project card "facility".

    Attributes:
        selection_dict: dict: Dictionary of selection criteria
        selected_trips: list: List of selected trips
        selected_trips_df: pd.DataFrame: DataFrame of selected trips
        sel_key: str: Hash of selection_dict
        net: TransitNetwork: Network to select from
    """

    def __init__(
        self,
        net: TransitNetwork,
        selection_dict: Union[dict, SelectTransitTrips],
    ):
        """Constructor for TransitSelection object.

        Args:
            net (TransitNetwork): Transit network object to select from.
            selection_dict: Selection dictionary conforming to SelectTransitTrips
        """
        self.net = net
        self.selection_dict = selection_dict

        # Initialize
        self._selected_trips_df = None
        self.sel_key = dict_to_hexkey(selection_dict)
        self._stored_feed_hash = copy.deepcopy(self.net.feed.hash)

        WranglerLogger.debug(f"...created TransitSelection object: {selection_dict}")

    def __nonzero__(self):
        """Return True if there are selected trips."""
        return len(self.selected_trips_df) > 0

    @property
    def selection_dict(self):
        """Getter for selection_dict."""
        return self._selection_dict

    @selection_dict.setter
    def selection_dict(self, value: Union[dict, SelectTransitTrips]):
        self._selection_dict = self.validate_selection_dict(value)

    def validate_selection_dict(self, selection_dict: Union[dict, SelectTransitTrips]) -> dict:
        """Check that selection dictionary has valid and used properties consistent with network.

        Checks that selection_dict is a valid TransitSelectionDict:
            - query vars exist in respective Feed tables
        Args:
            selection_dict (dict): selection dictionary

        Raises:
            TransitSelectionNetworkConsistencyError: If not consistent with transit network
            ValidationError: if format not consistent with SelectTransitTrips
        """
        if not isinstance(selection_dict, SelectTransitTrips):
            selection_dict = SelectTransitTrips(**selection_dict)
        selection_dict = selection_dict.model_dump(exclude_none=True, by_alias=True)
        WranglerLogger.debug(f"SELECT DICT - before Validation: \n{selection_dict}")
        _trip_selection_fields = list((selection_dict.get("trip_properties", {}) or {}).keys())
        _missing_trip_fields = set(_trip_selection_fields) - set(self.net.feed.trips.columns)

        if _missing_trip_fields:
            msg = f"Fields in trip selection dictionary but not trips.txt: {_missing_trip_fields}"
            raise TransitSelectionNetworkConsistencyError(msg)

        _route_selection_fields = list((selection_dict.get("route_properties", {}) or {}).keys())
        _missing_route_fields = set(_route_selection_fields) - set(self.net.feed.routes.columns)

        if _missing_route_fields:
            msg = (
                f"Fields in route selection dictionary but not routes.txt: {_missing_route_fields}"
            )
            raise TransitSelectionNetworkConsistencyError(msg)
        return selection_dict

    @property
    def selected_trips(self) -> list:
        """List of selected trip_ids."""
        if self.selected_trips_df is None:
            return []
        return self.selected_trips_df.trip_id.tolist()

    @property
    def selected_trips_df(self) -> DataFrame[WranglerTripsTable]:
        """Lazily evaluates selection for trips or returns stored value in self._selected_trips_df.

        Will re-evaluate if the current network hash is different than the stored one from the
        last selection.

        Returns:
            DataFrame[WranglerTripsTable] of selected trips
        """
        if (self._selected_trips_df is not None) and self._stored_feed_hash == self.net.feed_hash:
            return self._selected_trips_df

        self._selected_trips_df = self._select_trips()
        self._stored_feed_hash = copy.deepcopy(self.net.feed_hash)
        return self._selected_trips_df

    @property
    def selected_frequencies_df(self) -> DataFrame[WranglerFrequenciesTable]:
        """DataFrame of selected frequencies."""
        sel_freq_df = self.net.feed.frequencies.loc[
            self.net.feed.frequencies.trip_id.isin(self.selected_trips_df.trip_id)
        ]
        # if timespans are selected, filter to those that overlap
        if self.selection_dict.get("timespans"):
            sel_freq_df = filter_df_to_overlapping_timespans(
                sel_freq_df, self.selection_dict.get("timespans")
            )
        return sel_freq_df

    @property
    def selected_shapes_df(self) -> DataFrame[WranglerShapesTable]:
        """DataFrame of selected shapes.

        Can visualize the selected shapes quickly using the following code:

        ```python
        all_routes = net.feed.shapes.plot(color="gray")
        selection.selected_shapes_df.plot(ax=all_routes, color="red")
        ```

        """
        return self.net.feed.shapes.loc[
            self.net.feed.shapes.shape_id.isin(self.selected_trips_df.shape_id)
        ]

    def _select_trips(self) -> DataFrame[WranglerTripsTable]:
        """Selects transit trips based on selection dictionary.

        Returns:
            DataFrame[WranglerTripsTable]: trips_df DataFrame of selected trips
        """
        return _filter_trips_by_selection_dict(
            self.net.feed,
            self.selection_dict,
        )

network_wrangler.transit.selection.TransitSelection.selected_frequencies_df property

selected_frequencies_df

DataFrame of selected frequencies.

network_wrangler.transit.selection.TransitSelection.selected_shapes_df property

selected_shapes_df

DataFrame of selected shapes.

Can visualize the selected shapes quickly using the following code:

all_routes = net.feed.shapes.plot(color="gray")
selection.selected_shapes_df.plot(ax=all_routes, color="red")

network_wrangler.transit.selection.TransitSelection.selected_trips property

selected_trips

List of selected trip_ids.

network_wrangler.transit.selection.TransitSelection.selected_trips_df property

selected_trips_df

Lazily evaluates selection for trips or returns stored value in self._selected_trips_df.

Will re-evaluate if the current network hash is different than the stored one from the last selection.

Returns:

network_wrangler.transit.selection.TransitSelection.selection_dict property writable

selection_dict

Getter for selection_dict.

network_wrangler.transit.selection.TransitSelection.__init__

__init__(net, selection_dict)

Constructor for TransitSelection object.

Parameters:

  • net (TransitNetwork) –

    Transit network object to select from.

  • selection_dict (Union[dict, SelectTransitTrips]) –

    Selection dictionary conforming to SelectTransitTrips

Source code in network_wrangler/transit/selection.py
def __init__(
    self,
    net: TransitNetwork,
    selection_dict: Union[dict, SelectTransitTrips],
):
    """Constructor for TransitSelection object.

    Args:
        net (TransitNetwork): Transit network object to select from.
        selection_dict: Selection dictionary conforming to SelectTransitTrips
    """
    self.net = net
    self.selection_dict = selection_dict

    # Initialize
    self._selected_trips_df = None
    self.sel_key = dict_to_hexkey(selection_dict)
    self._stored_feed_hash = copy.deepcopy(self.net.feed.hash)

    WranglerLogger.debug(f"...created TransitSelection object: {selection_dict}")

network_wrangler.transit.selection.TransitSelection.__nonzero__

__nonzero__()

Return True if there are selected trips.

Source code in network_wrangler/transit/selection.py
def __nonzero__(self):
    """Return True if there are selected trips."""
    return len(self.selected_trips_df) > 0

network_wrangler.transit.selection.TransitSelection.validate_selection_dict

validate_selection_dict(selection_dict)

Check that selection dictionary has valid and used properties consistent with network.

Checks that selection_dict is a valid TransitSelectionDict
  • query vars exist in respective Feed tables

Raises:

Source code in network_wrangler/transit/selection.py
def validate_selection_dict(self, selection_dict: Union[dict, SelectTransitTrips]) -> dict:
    """Check that selection dictionary has valid and used properties consistent with network.

    Checks that selection_dict is a valid TransitSelectionDict:
        - query vars exist in respective Feed tables
    Args:
        selection_dict (dict): selection dictionary

    Raises:
        TransitSelectionNetworkConsistencyError: If not consistent with transit network
        ValidationError: if format not consistent with SelectTransitTrips
    """
    if not isinstance(selection_dict, SelectTransitTrips):
        selection_dict = SelectTransitTrips(**selection_dict)
    selection_dict = selection_dict.model_dump(exclude_none=True, by_alias=True)
    WranglerLogger.debug(f"SELECT DICT - before Validation: \n{selection_dict}")
    _trip_selection_fields = list((selection_dict.get("trip_properties", {}) or {}).keys())
    _missing_trip_fields = set(_trip_selection_fields) - set(self.net.feed.trips.columns)

    if _missing_trip_fields:
        msg = f"Fields in trip selection dictionary but not trips.txt: {_missing_trip_fields}"
        raise TransitSelectionNetworkConsistencyError(msg)

    _route_selection_fields = list((selection_dict.get("route_properties", {}) or {}).keys())
    _missing_route_fields = set(_route_selection_fields) - set(self.net.feed.routes.columns)

    if _missing_route_fields:
        msg = (
            f"Fields in route selection dictionary but not routes.txt: {_missing_route_fields}"
        )
        raise TransitSelectionNetworkConsistencyError(msg)
    return selection_dict

Functions to check for transit network validity and consistency with roadway network.

shape_links_without_road_links(tr_shapes, rd_links_df)

Validate that links in transit shapes exist in referenced roadway links.

Parameters:

  • tr_shapes (DataFrame[WranglerShapesTable]) –

    transit shapes from shapes.txt to validate foreign key to.

  • rd_links_df (DataFrame[RoadLinksTable]) –

    Links dataframe from roadway network to validate

Returns:

  • DataFrame

    df with shape_id and A, B

Source code in network_wrangler/transit/validate.py
def shape_links_without_road_links(
    tr_shapes: DataFrame[WranglerShapesTable],
    rd_links_df: DataFrame[RoadLinksTable],
) -> pd.DataFrame:
    """Validate that links in transit shapes exist in referenced roadway links.

    Args:
        tr_shapes: transit shapes from shapes.txt to validate foreign key to.
        rd_links_df: Links dataframe from roadway network to validate

    Returns:
        df with shape_id and A, B
    """
    tr_shape_links = unique_shape_links(tr_shapes)
    # WranglerLogger.debug(f"Unique shape links: \n {tr_shape_links}")
    rd_links_transit_ok = rd_links_df[
        (rd_links_df["drive_access"]) | (rd_links_df["bus_only"]) | (rd_links_df["rail_only"])
    ]

    merged_df = tr_shape_links.merge(
        rd_links_transit_ok[["A", "B"]],
        how="left",
        on=["A", "B"],
        indicator=True,
    )

    missing_links_df = merged_df.loc[merged_df._merge == "left_only", ["shape_id", "A", "B"]]
    if len(missing_links_df):
        WranglerLogger.error(
            f"! Transit shape links missing in roadway network: \n {missing_links_df}"
        )
    return missing_links_df[["shape_id", "A", "B"]]
stop_times_without_road_links(tr_stop_times, rd_links_df)

Validate that links in transit shapes exist in referenced roadway links.

Parameters:

  • tr_stop_times (DataFrame[WranglerStopTimesTable]) –

    transit stop_times from stop_times.txt to validate foreign key to.

  • rd_links_df (DataFrame[RoadLinksTable]) –

    Links dataframe from roadway network to validate

Returns:

  • DataFrame

    df with shape_id and A, B

Source code in network_wrangler/transit/validate.py
def stop_times_without_road_links(
    tr_stop_times: DataFrame[WranglerStopTimesTable],
    rd_links_df: DataFrame[RoadLinksTable],
) -> pd.DataFrame:
    """Validate that links in transit shapes exist in referenced roadway links.

    Args:
        tr_stop_times: transit stop_times from stop_times.txt to validate foreign key to.
        rd_links_df: Links dataframe from roadway network to validate

    Returns:
        df with shape_id and A, B
    """
    tr_links = unique_stop_time_links(tr_stop_times)

    rd_links_transit_ok = rd_links_df[
        (rd_links_df["drive_access"]) | (rd_links_df["bus_only"]) | (rd_links_df["rail_only"])
    ]

    merged_df = tr_links.merge(
        rd_links_transit_ok[["A", "B"]],
        how="left",
        on=["A", "B"],
        indicator=True,
    )

    missing_links_df = merged_df.loc[merged_df._merge == "left_only", ["trip_id", "A", "B"]]
    if len(missing_links_df):
        WranglerLogger.error(
            f"! Transit stop_time links missing in roadway network: \n {missing_links_df}"
        )
    return missing_links_df[["trip_id", "A", "B"]]

network_wrangler.transit.validate.transit_nodes_without_road_nodes

transit_nodes_without_road_nodes(feed, nodes_df, rd_field='model_node_id')

Validate all of a transit feeds node foreign keys exist in referenced roadway nodes.

Parameters:

  • feed (Feed) –

    Transit Feed to query.

  • nodes_df (DataFrame) –

    Nodes dataframe from roadway network to validate foreign key to. Defaults to self.roadway_net.nodes_df

  • rd_field (str, default: 'model_node_id' ) –

    field in roadway nodes to check against. Defaults to “model_node_id”

Returns:

  • list[int]

    boolean indicating if relationships are all valid

Source code in network_wrangler/transit/validate.py
def transit_nodes_without_road_nodes(
    feed: Feed,
    nodes_df: DataFrame[RoadNodesTable],
    rd_field: str = "model_node_id",
) -> list[int]:
    """Validate all of a transit feeds node foreign keys exist in referenced roadway nodes.

    Args:
        feed: Transit Feed to query.
        nodes_df (pd.DataFrame, optional): Nodes dataframe from roadway network to validate
            foreign key to. Defaults to self.roadway_net.nodes_df
        rd_field: field in roadway nodes to check against. Defaults to "model_node_id"

    Returns:
        boolean indicating if relationships are all valid
    """
    feed_nodes_series = [
        feed.stops["stop_id"],
        feed.shapes["shape_model_node_id"],
        feed.stop_times["stop_id"],
    ]
    tr_nodes = set(concat_with_attr(feed_nodes_series).unique())
    rd_nodes = set(nodes_df[rd_field].unique().tolist())
    # nodes in tr_nodes but not rd_nodes
    missing_tr_nodes = list(tr_nodes - rd_nodes)

    if missing_tr_nodes:
        WranglerLogger.error(
            f"! Transit nodes in missing in roadway network: \n {missing_tr_nodes}"
        )
    return missing_tr_nodes

network_wrangler.transit.validate.transit_road_net_consistency

transit_road_net_consistency(feed, road_net)

Checks foreign key and network link relationships between transit feed and a road_net.

Parameters:

  • feed (Feed) –

    Transit Feed.

  • road_net (RoadwayNetwork) –

    Roadway network to check relationship with.

Returns:

  • bool ( bool ) –

    boolean indicating if road_net is consistent with transit network.

Source code in network_wrangler/transit/validate.py
def transit_road_net_consistency(feed: Feed, road_net: RoadwayNetwork) -> bool:
    """Checks foreign key and network link relationships between transit feed and a road_net.

    Args:
        feed: Transit Feed.
        road_net (RoadwayNetwork): Roadway network to check relationship with.

    Returns:
        bool: boolean indicating if road_net is consistent with transit network.
    """
    _missing_links = shape_links_without_road_links(feed.shapes, road_net.links_df)
    _missing_nodes = transit_nodes_without_road_nodes(feed, road_net.nodes_df)
    _consistency = _missing_links.empty and not _missing_nodes
    return _consistency

network_wrangler.transit.validate.validate_transit_in_dir

validate_transit_in_dir(dir, file_format='txt', road_dir=None, road_file_format='geojson')

Validates a roadway network in a directory to the wrangler data model specifications.

Parameters:

  • dir (Path) –

    The transit network file directory.

  • file_format (str, default: 'txt' ) –

    The format of roadway network file name. Defaults to “txt”.

  • road_dir (Path, default: None ) –

    The roadway network file directory. Defaults to None.

  • road_file_format (str, default: 'geojson' ) –

    The format of roadway network file name. Defaults to “geojson”.

  • output_dir (str) –

    The output directory for the validation report. Defaults to “.”.

Source code in network_wrangler/transit/validate.py
def validate_transit_in_dir(
    dir: Path,
    file_format: TransitFileTypes = "txt",
    road_dir: Optional[Path] = None,
    road_file_format: RoadwayFileTypes = "geojson",
) -> bool:
    """Validates a roadway network in a directory to the wrangler data model specifications.

    Args:
        dir (Path): The transit network file directory.
        file_format (str): The format of roadway network file name. Defaults to "txt".
        road_dir (Path): The roadway network file directory. Defaults to None.
        road_file_format (str): The format of roadway network file name. Defaults to "geojson".
        output_dir (str): The output directory for the validation report. Defaults to ".".
    """
    from .io import load_transit  # noqa: PLC0415

    try:
        t = load_transit(dir, file_format=file_format)
    except SchemaErrors as e:
        WranglerLogger.error(f"!!! [Transit Network invalid] - Failed Loading to Feed object\n{e}")
        return False
    if road_dir is not None:
        from ..roadway import load_roadway_from_dir  # noqa: PLC0415
        from .network import TransitRoadwayConsistencyError  # noqa: PLC0415

        try:
            r = load_roadway_from_dir(road_dir, file_format=road_file_format)
        except FileNotFoundError:
            WranglerLogger.error(f"! Roadway network not found in {road_dir}")
            return False
        except Exception as e:
            WranglerLogger.error(
                f"! Error loading roadway network. \
                                 Skipping validation of road to transit network.\n{e}"
            )
        try:
            t.road_net = r
        except TransitRoadwayConsistencyError as e:
            WranglerLogger.error(
                f"!!! [Tranit Network inconsistent] Error in road to transit \
                                 network consistency.\n{e}"
            )
            return False

    return True