Utilities ¶
Utility functions and helper classes used throughout Network Wrangler.
Core Utilities ¶
General utility functions used throughout package.
network_wrangler.utils.utils.DictionaryMergeError ¶
network_wrangler.utils.utils.check_one_or_one_superset_present ¶
Checks that exactly one of the fields in mixed_list is in fields_present or one superset.
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.combine_unique_unhashable_list ¶
Combines lists preserving order of first and removing duplicates.
Parameters:
-
list1
(list
) –The first list.
-
list2
(list
) –The second list.
Returns:
-
list
–A new list containing the elements from list1 followed by the
-
–
unique elements from list2.
Example
list1 = [1, 2, 3] list2 = [2, 3, 4, 5] combine_unique_unhashable_list(list1, list2) [1, 2, 3, 4, 5]
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.delete_keys_from_dict ¶
Removes list of keys from potentially nested dictionary.
SOURCE: https://stackoverflow.com/questions/3405715/ User: @mseifert
Parameters:
-
dictionary
(dict
) –dictionary to remove keys from
-
keys
(list
) –list of keys to remove
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.dict_to_hexkey ¶
Converts a dictionary to a hexdigest of the sha1 hash of the dictionary.
Parameters:
-
d
(dict
) –dictionary to convert to string
Returns:
-
str
(str
) –hexdigest of the sha1 hash of dictionary
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.findkeys ¶
Returns values of all keys in various objects.
Adapted from arainchi on Stack Overflow: https://stackoverflow.com/questions/9807634/find-all-occurrences-of-a-key-in-nested-dictionaries-and-lists
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.get_overlapping_range ¶
Returns the overlapping range for a list of ranges or tuples defining ranges.
Parameters:
-
ranges
(list[Union[tuple[int], range]]
) –A list of ranges or tuples defining ranges.
Returns:
-
Union[None, range]
–Union[None, range]: The overlapping range if found, otherwise None.
Example
ranges = [(1, 5), (3, 7), (6, 10)] get_overlapping_range(ranges) range(3, 5)
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.list_elements_subset_of_single_element ¶
Find the first list in the mixed_list.
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.make_slug ¶
network_wrangler.utils.utils.merge_dicts ¶
Merges the contents of nested dict left into nested dict right.
Raises errors in case of namespace conflicts.
Parameters:
-
right
–dict, modified in place
-
left
–dict to be merged into right
-
path
–default None, sequence of keys to be reported in case of error in merging nested dictionaries
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.normalize_to_lists ¶
Turn a mixed list of scalars and lists into a list of lists.
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.split_string_prefix_suffix_from_num ¶
Split a string prefix and suffix from last number.
Parameters:
-
input_string
(str
) –The input string to be processed.
Returns:
-
tuple
–A tuple containing the prefix (including preceding numbers), the last numeric part as an integer, and the suffix.
Notes
This function uses regular expressions to split a string into three parts: the prefix, the last numeric part, and the suffix. The prefix includes any preceding numbers, the last numeric part is converted to an integer, and the suffix includes any non-digit characters after the last numeric part.
Examples:
Source code in network_wrangler/utils/utils.py
network_wrangler.utils.utils.topological_sort ¶
Topological sorting for Acyclic Directed Graph.
Parameters: - adjacency_list (dict): A dictionary representing the adjacency list of the graph. - visited_list (list): A list representing the visited status of each vertex in the graph.
Returns: - output_stack (list): A list containing the vertices in topological order.
This function performs a topological sort on an acyclic directed graph. It takes an adjacency list and a visited list as input. The adjacency list represents the connections between vertices in the graph, and the visited list keeps track of the visited status of each vertex.
The function uses a recursive helper function to perform the topological sort. It starts by iterating over each vertex in the visited list. For each unvisited vertex, it calls the helper function, which recursively visits all the neighbors of the vertex and adds them to the output stack in reverse order. Finally, it returns the output stack, which contains the vertices in topological order.
Source code in network_wrangler/utils/utils.py
Package Constants ¶
Parameters for Network Wrangler which should not be changed by the user.
Parameters that are here are used throughout the codebase and are stated here for easy reference. Additional parameters that are more narrowly scoped are defined in the appropriate modules.
Changing these parameters may have unintended consequences and should only be done by developers who understand the codebase.
network_wrangler.params.SMALL_RECS
module-attribute
¶
Number of records to display in a dataframe summary.
I/O Utilities ¶
Helper functions for reading and writing files to reduce boilerplate.
network_wrangler.utils.io_table.FileReadError ¶
network_wrangler.utils.io_table.FileWriteError ¶
network_wrangler.utils.io_table.convert_file_serialization ¶
convert_file_serialization(input_file, output_file, overwrite=True, boundary_gdf=None, boundary_geocode=None, boundary_file=None, node_filter_s=None, chunk_size=None)
Convert a file serialization format to another and optionally filter to a boundary.
If the input file is a JSON file that is larger than a reasonable portion of available memory, and the output file is a Parquet file the JSON file will be read in chunks.
If the input file is a Geographic data type (shp, geojon, geoparquet) and a boundary is provided, the data will be filtered to the boundary.
Parameters:
-
input_file
(Path
) –Path to the input JSON or GEOJSON file.
-
output_file
(Path
) –Path to the output Parquet file.
-
overwrite
(bool
, default:True
) –If True, overwrite the output file if it exists.
-
boundary_gdf
(Optional[GeoDataFrame]
, default:None
) –GeoDataFrame to filter the input data to. Only used for geographic data. Defaults to None.
-
boundary_geocode
(Optional[str]
, default:None
) –Geocode to filter the input data to. Only used for geographic data. Defaults to None.
-
boundary_file
(Optional[Path]
, default:None
) –File to load as a boundary to filter the input data to. Only used for geographic data. Defaults to None.
-
node_filter_s
(Optional[Series]
, default:None
) –If provided, will filter links in .json file to only those that connect to nodes. Defaults to None.
-
chunk_size
(Optional[int]
, default:None
) –Number of JSON objects to process in each chunk. Only works for JSON to Parquet. If None, will determine if chunking needed and what size.
Source code in network_wrangler/utils/io_table.py
network_wrangler.utils.io_table.prep_dir ¶
Prepare a directory for writing files.
Source code in network_wrangler/utils/io_table.py
network_wrangler.utils.io_table.read_table ¶
read_table(filename, sub_filename=None, boundary_gdf=None, boundary_geocode=None, boundary_file=None, read_speed=DefaultConfig.CPU.EST_PD_READ_SPEED)
Read file and return a dataframe or geodataframe.
If filename is a zip file, will unzip to a temporary directory.
If filename is a geojson or shapefile, will filter the data to the boundary_gdf, boundary_geocode, or boundary_file if provided. Note that you can only provide one of these boundary filters.
If filename is a geoparquet file, will filter the data to the bounding box of the boundary_gdf, boundary_geocode, or boundary_file if provided. Note that you can only provide one of these boundary filters.
NOTE: if you are accessing multiple files from this zip file you will want to unzip it first and THEN access the table files so you don’t create multiple duplicate unzipped tmp dirs.
Parameters:
-
filename
(Path
) –filename to load.
-
sub_filename
(Optional[str]
, default:None
) –if the file is a zip, the sub_filename to load.
-
boundary_gdf
(Optional[GeoDataFrame]
, default:None
) –GeoDataFrame to filter the input data to. Only used for geographic data. Defaults to None.
-
boundary_geocode
(Optional[str]
, default:None
) –Geocode to filter the input data to. Only used for geographic data. Defaults to None.
-
boundary_file
(Optional[Path]
, default:None
) –File to load as a boundary to filter the input data to. Only used for geographic data. Defaults to None.
-
read_speed
(dict
, default:EST_PD_READ_SPEED
) –dictionary of read speeds for different file types. Defaults to DefaultConfig.CPU.EST_PD_READ_SPEED.
Source code in network_wrangler/utils/io_table.py
network_wrangler.utils.io_table.unzip_file ¶
Unzips a file to a temporary directory and returns the directory path.
Source code in network_wrangler/utils/io_table.py
network_wrangler.utils.io_table.write_table ¶
Write a dataframe or geodataframe to a file.
Parameters:
-
df
(DataFrame
) –dataframe to write.
-
filename
(Path
) –filename to write to.
-
overwrite
(bool
, default:False
) –whether to overwrite the file if it exists. Defaults to False.
-
kwargs
–additional arguments to pass to the writer.
Source code in network_wrangler/utils/io_table.py
Utility functions for loading dictionaries from files.
network_wrangler.utils.io_dict.load_dict ¶
Load a dictionary from a file.
Source code in network_wrangler/utils/io_dict.py
network_wrangler.utils.io_dict.load_merge_dict ¶
Load and merge multiple dictionaries from files.
Source code in network_wrangler/utils/io_dict.py
Data Manipulation ¶
Helper functions for data models.
network_wrangler.utils.models.DatamodelDataframeIncompatableError ¶
network_wrangler.utils.models.TableValidationError ¶
network_wrangler.utils.models.coerce_extra_fields_to_type_in_df ¶
Coerce extra fields in data that aren’t specified in Pydantic model to the type in the df.
Note: will not coerce lists of submodels, etc.
Parameters:
-
data
(dict
) –The data to coerce.
-
model
(BaseModel
) –The Pydantic model to validate the data against.
-
df
(DataFrame
) –The DataFrame to coerce the data to.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.default_from_datamodel ¶
Returns default value from pandera data model for a given field name.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.empty_df_from_datamodel ¶
Create an empty DataFrame or GeoDataFrame with the specified columns.
Parameters:
-
model
(BaseModel
) –A pandera data model to create empty [Geo]DataFrame from.
-
crs
(int
, default:LAT_LON_CRS
) –if schema has geometry, will use this as the geometry’s crs. Defaults to LAT_LONG_CRS
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.extra_attributes_undefined_in_model ¶
Find the extra attributes in a pydantic model that are not defined in the model.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.fill_df_with_defaults_from_model ¶
Fill a DataFrame with default values from a Pandera DataFrameModel.
Parameters:
-
df
–DataFrame to fill with default values.
-
model
–Pandera DataFrameModel to get default values from.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.identify_model ¶
Identify the model that the input data conforms to.
Parameters:
-
data
(Union[DataFrame, dict]
) –The input data to identify.
-
models
(list[DataFrameModel, BaseModel]
) –A list of models to validate the input data against.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.order_fields_from_data_model ¶
Order the fields in a DataFrame to match the order in a Pandera DataFrameModel.
Will add any fields that are not in the model to the end of the DataFrame. Will not add any fields that are in the model but not in the DataFrame.
Parameters:
-
df
(DataFrame
) –DataFrame to order.
-
model
(DataFrameModel
) –Pandera DataFrameModel to order the DataFrame to.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.submodel_fields_in_model ¶
Find the fields in a pydantic model that are submodels.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.validate_call_pyd ¶
Decorator to validate the function i/o using Pydantic models without Pandera.
Source code in network_wrangler/utils/models.py
network_wrangler.utils.models.validate_df_to_model ¶
Wrapper to validate a DataFrame against a Pandera DataFrameModel with better logging.
Also copies the attrs from the input DataFrame to the validated DataFrame.
Parameters:
-
df
(DataFrame
) –DataFrame to validate.
-
model
(type
) –Pandera DataFrameModel to validate against.
-
output_file
(Path
, default:Path('validation_failure_cases.csv')
) –Optional file to write validation errors to. Defaults to validation_failure_cases.csv.
Source code in network_wrangler/utils/models.py
Utility functions for pandas data manipulation.
network_wrangler.utils.data.DataSegmentationError ¶
network_wrangler.utils.data.InvalidJoinFieldError ¶
network_wrangler.utils.data.MissingPropertiesError ¶
network_wrangler.utils.data.coerce_dict_to_df_types ¶
Coerce dictionary values to match the type of a dataframe columns matching dict keys.
Will also coerce a list of values.
Parameters:
-
d
(dict
) –dictionary to coerce with singleton or list values
-
df
(DataFrame
) –dataframe to get types from
-
skip_keys
(Optional[list]
, default:None
) –list of dict keys to skip. Defaults to []/
-
return_skipped
(bool
, default:False
) –keep the uncoerced, skipped keys/vals in the resulting dict. Defaults to False.
Returns:
-
dict
(dict[str, CoerceTypes]
) –dict with coerced types
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.coerce_gdf ¶
Coerce a DataFrame to a GeoDataFrame, optionally with a new geometry.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.coerce_val_to_df_types ¶
Coerce field value to match the type of a matching dataframe columns.
Parameters:
-
field
(str
) –field to lookup
-
val
(CoerceTypes
) –value or list of values to coerce
-
df
(DataFrame
) –dataframe to get types from
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.coerce_val_to_series_type ¶
Coerces a value to match type of pandas series.
Will try not to fail so if you give it a value that can’t convert to a number, it will return a string.
Parameters:
-
val
–Any type of singleton value
-
s
(Series
) –series to match the type to
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.compare_df_values ¶
Compare overlapping part of dataframes and returns where there are differences.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.compare_lists ¶
network_wrangler.utils.data.concat_with_attr ¶
Concatenate a list of dataframes and retain the attributes of the first dataframe.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.convert_numpy_to_list ¶
Function to recursively convert numpy arrays to lists.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.dict_fields_in_df ¶
Check if all fields in dict are in dataframe.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.dict_to_query ¶
Generates the query of from selection_dict.
Parameters:
-
selection_dict
(Mapping[str, Any]
) –selection dictionary
Returns:
-
_type_
(str
) –Query value
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.diff_dfs ¶
Returns True if two dataframes are different and log differences.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.diff_list_like_series ¶
Compare two series that contain list-like items as strings.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.fk_in_pk ¶
Check if all foreign keys are in the primary keys, optionally ignoring NaN.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.isin_dict ¶
Filter the dataframe using a dictionary - faster than using isin.
Uses merge to filter the dataframe by the dictionary keys and values.
Parameters:
-
df
(DataFrame
) –dataframe to filter
-
d
(dict
) –dictionary with keys as column names and values as values to filter by
-
ignore_missing
(bool
, default:True
) –if True, will ignore missing values in the selection dict.
-
strict_str
(bool
, default:False
) –if True, will not allow partial string matches and will force case-matching. Defaults to False. If False, will be overridden if key is in STRICT_MATCH_FIELDS or if ignore_missing is False.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.list_like_columns ¶
Find columns in a dataframe that contain list-like items that can’t be json-serialized.
Parameters:
-
df
–dataframe to check
-
item_type
(Optional[type]
, default:None
) –if not None, will only return columns where all items are of this type by checking only the first item in the column. Defaults to None.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.segment_data_by_selection ¶
Segment a dataframe or series into before, middle, and end segments based on item_list.
selected segment = everything from the first to last item in item_list inclusive of the first and last items. Before segment = everything before After segment = everything after
Parameters:
-
item_list
(list
) –List of items to segment data by. If longer than two, will only use the first and last items.
-
data
(Union[Series, DataFrame]
) –Data to segment into before, middle, and after.
-
field
(str
, default:None
) –If a dataframe, specifies which field to reference. Defaults to None.
-
end_val
(int
, default:0
) –Notation for util the end or from the begining. Defaults to 0.
Raises:
-
DataSegmentationError
–If item list isn’t found in data in correct order.
Returns:
-
tuple
(tuple[Union[Series, list, DataFrame], Union[Series, list, DataFrame], Union[Series, list, DataFrame]]
) –data broken out by beofore, selected segment, and after.
Source code in network_wrangler/utils/data.py
352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 |
|
network_wrangler.utils.data.segment_data_by_selection_min_overlap ¶
Segments data based on item_list reducing overlap with replacement list.
selected segment: everything from the first to last item in item_list inclusive of the first and last items but not if first and last items overlap with replacement list. Before segment = everything before After segment = everything after
Example: selection_list = [2,5] data = pd.DataFrame({“i”:[1,2,3,4,5,6]}) field = “i” replacements_list = [2,22,33]
Returns:
-
list
–[22,33]
-
tuple[Union[Series, DataFrame], Union[Series, DataFrame], Union[Series, DataFrame]]
–[1], [2,3,4,5], [6]
Parameters:
-
selection_list
(list
) –List of items to segment data by. If longer than two, will only use the first and last items.
-
data
(Union[Series, DataFrame]
) –Data to segment into before, middle, and after.
-
field
(str
) –Specifies which field to reference.
-
replacements_list
(list
) –List of items to eventually replace the selected segment with.
-
end_val
(int
, default:0
) –Notation for util the end or from the begining. Defaults to 0.
tuple containing:
-
list
–- updated replacement_list
-
tuple[Union[Series, DataFrame], Union[Series, DataFrame], Union[Series, DataFrame]]
–- tuple of before, selected segment, and after data
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.update_df_by_col_value ¶
Updates destination_df with ALL values in source_df for specified props with same join_col.
Source_df can contain a subset of IDs of destination_df. If fail_if_missing is true, destination_df must have all the IDS in source DF - ensuring all source_df values are contained in resulting df.
>> destination_df
trip_id property1 property2
1 10 100
2 20 200
3 30 300
4 40 400
>> source_df
trip_id property1 property2
2 25 250
3 35 350
>> updated_df
trip_id property1 property2
0 1 10 100
1 2 25 250
2 3 35 350
3 4 40 400
Parameters:
-
destination_df
(DataFrame
) –Dataframe to modify.
-
source_df
(DataFrame
) –Dataframe with updated columns
-
join_col
(str
) –column to join on
-
properties
(list[str]
, default:None
) –List of properties to use. If None, will default to all in source_df.
-
fail_if_missing
(bool
, default:True
) –If True, will raise an error if there are missing IDs in destination_df that exist in source_df.
Source code in network_wrangler/utils/data.py
network_wrangler.utils.data.validate_existing_value_in_df ¶
Validate if df[field]==expected_value for all indices in idx.
Source code in network_wrangler/utils/data.py
Dataframe accessors that allow functions to be called directly on the dataframe.
network_wrangler.utils.df_accessors.DictQueryAccessor ¶
Query link, node and shape dataframes using project selection dictionary.
Will overlook any keys which are not columns in the dataframe.
Usage:
selection_dict = {
"lanes": [1, 2, 3],
"name": ["6th", "Sixth", "sixth"],
"drive_access": 1,
}
selected_links_df = links_df.dict_query(selection_dict)
Source code in network_wrangler/utils/df_accessors.py
network_wrangler.utils.df_accessors.DictQueryAccessor.__call__ ¶
Queries the dataframe using the selection dictionary.
Parameters:
-
selection_dict
(dict
) –description
-
return_all_if_none
(bool
, default:False
) –If True, will return entire df if dict has no values. Defaults to False.
Source code in network_wrangler/utils/df_accessors.py
network_wrangler.utils.df_accessors.DictQueryAccessor.__init__ ¶
network_wrangler.utils.df_accessors.Isin_dict ¶
Faster implimentation of isin for querying dataframes with dictionary.
Source code in network_wrangler/utils/df_accessors.py
network_wrangler.utils.df_accessors.Isin_dict.__call__ ¶
network_wrangler.utils.df_accessors.dfHash ¶
Creates a dataframe hash that is compatable with geopandas and various metadata.
Definitely not the fastest, but she seems to work where others have failed.
Source code in network_wrangler/utils/df_accessors.py
network_wrangler.utils.df_accessors.dfHash.__call__ ¶
Function to hash the dataframe with version-robust computation.
Source code in network_wrangler/utils/df_accessors.py
Network and Geographic Utilities ¶
Functions to help with network manipulations in dataframes.
network_wrangler.utils.net.point_seq_to_links ¶
Translates a df with tidy data representing a sequence of points into links.
Parameters:
-
point_seq_df
(DataFrame
) –Dataframe with source breadcrumbs
-
id_field
(str
) –Trace ID
-
seq_field
(str
) –Order of breadcrumbs within ID_field
-
node_id_field
(str
) –field denoting the node ID
-
from_field
(str
, default:'A'
) –Field to export from_field to. Defaults to “A”.
-
to_field
(str
, default:'B'
) –Field to export to_field to. Defaults to “B”.
Returns:
-
DataFrame
–pd.DataFrame: Link records with id_field, from_field, to_field
Source code in network_wrangler/utils/net.py
Helper geographic manipulation functions.
network_wrangler.utils.geo.InvalidCRSError ¶
network_wrangler.utils.geo.check_point_valid_for_crs ¶
Check if a point is valid for a given coordinate reference system.
Parameters:
-
point
(Point
) –Shapely Point
-
crs
(int
) –coordinate reference system in ESPG code
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.get_bearing ¶
Calculate the bearing (forward azimuth) b/w the two points.
returns: bearing in radians
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.get_bounding_polygon ¶
Get the bounding polygon for a given boundary.
Will return None if no arguments given. Will raise a ValueError if more than one given.
This function retrieves the bounding polygon for a given boundary. The boundary can be provided as a GeoDataFrame, a geocode string or dictionary, or a boundary file. The resulting polygon geometry is returned as a GeoSeries.
Parameters:
-
boundary_geocode
(Union[str, dict]
, default:None
) –A geocode string or dictionary representing the boundary. Defaults to None.
-
boundary_file
(Union[str, Path]
, default:None
) –A path to the boundary file. Only used if boundary_geocode is None. Defaults to None.
-
boundary_gdf
(GeoDataFrame
, default:None
) –A GeoDataFrame representing the boundary. Only used if boundary_geocode and boundary_file are None. Defaults to None.
-
crs
(int
, default:LAT_LON_CRS
) –The coordinate reference system (CRS) code. Defaults to 4326 (WGS84).
Returns:
-
GeoSeries
–gpd.GeoSeries: The polygon geometry representing the bounding polygon.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.get_point_geometry_from_linestring ¶
Get a point geometry from a linestring geometry.
Parameters:
-
polyline_geometry
–shapely LineString instance
-
pos
(int
, default:0
) –position in the linestring to get the point from. Defaults to 0.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.length_of_linestring_miles ¶
Returns a Series with the linestring length in miles.
Parameters:
-
gdf
(Union[GeoSeries, GeoDataFrame]
) –GeoDataFrame with linestring geometry. If given a GeoSeries will attempt to convert to a GeoDataFrame.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.linestring_from_lats_lons ¶
Create a LineString geometry from a DataFrame with lon/lat fields.
Parameters:
-
df
–DataFrame with columns for lon/lat fields.
-
lat_fields
–list of column names for the lat fields.
-
lon_fields
–list of column names for the lon fields.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.linestring_from_nodes ¶
Creates a LineString geometry GeoSeries from a DataFrame of links and a DataFrame of nodes.
Parameters:
-
links_df
(DataFrame
) –DataFrame with columns for from_node and to_node.
-
nodes_df
(GeoDataFrame
) –GeoDataFrame with geometry column.
-
from_node
(str
, default:'A'
) –column name in links_df for the from node. Defaults to “A”.
-
to_node
(str
, default:'B'
) –column name in links_df for the to node. Defaults to “B”.
-
node_pk
(str
, default:'model_node_id'
) –primary key column name in nodes_df. Defaults to “model_node_id”.
Source code in network_wrangler/utils/geo.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
network_wrangler.utils.geo.location_ref_from_point ¶
Generates a shared street point location reference.
Parameters:
-
geometry
(Point
) –Point shapely geometry
-
sequence
(int
, default:1
) –Sequence if part of polyline. Defaults to None.
-
bearing
(float
, default:None
) –Direction of line if part of polyline. Defaults to None.
-
distance_to_next_ref
(float
, default:None
) –Distnce to next point if part of polyline. Defaults to None.
Returns:
-
LocationReference
(LocationReference
) –As defined by sharedStreets.io schema
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.location_refs_from_linestring ¶
Generates a shared street location reference from linestring.
Parameters:
-
geometry
(LineString
) –Shapely LineString instance
Returns:
-
LocationReferences
(list[LocationReference]
) –As defined by sharedStreets.io schema
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.offset_geometry_meters ¶
Offset a GeoSeries of LineStrings by a given distance in meters.
Parameters:
-
geo_s
(GeoSeries
) –GeoSeries of LineStrings to offset.
-
offset_distance_meters
(float
) –distance in meters to offset the LineStrings.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.offset_point_with_distance_and_bearing ¶
Get the new lon-lat (in degrees) given current point (lon-lat), distance and bearing.
Parameters:
-
lon
(float
) –longitude of original point
-
lat
(float
) –latitude of original point
-
distance
(float
) –distance in meters to offset point by
-
bearing
(float
) –direction to offset point to in radians
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.point_from_xy ¶
Creates point geometry from x and y coordinates.
Parameters:
-
x
–x coordinate, in xy_crs
-
y
–y coordinate, in xy_crs
-
xy_crs
(int
, default:LAT_LON_CRS
) –coordinate reference system in ESPG code for x/y inputs. Defaults to 4326 (WGS84)
-
point_crs
(int
, default:LAT_LON_CRS
) –coordinate reference system in ESPG code for point output. Defaults to 4326 (WGS84)
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.to_points_gdf ¶
Convert a table to a GeoDataFrame.
If the table is already a GeoDataFrame, return it as is. Otherwise, attempt to convert the table to a GeoDataFrame using the following methods: 1. If the table has a ‘geometry’ column, return a GeoDataFrame using that column. 2. If the table has ‘lat’ and ‘lon’ columns, return a GeoDataFrame using those columns. 3. If the table has a ‘*model_node_id’ or ‘stop_id’ column, return a GeoDataFrame using that column and the nodes_df provided. If none of the above, raise a ValueError.
Parameters:
-
table
(DataFrame
) –DataFrame to convert to GeoDataFrame.
-
ref_nodes_df
(Optional[GeoDataFrame]
, default:None
) –GeoDataFrame of nodes to use to convert model_node_id to geometry.
-
ref_road_net
(Optional[RoadwayNetwork]
, default:None
) –RoadwayNetwork object to use to convert model_node_id to geometry.
Returns:
-
GeoDataFrame
(GeoDataFrame
) –GeoDataFrame representation of the table.
Source code in network_wrangler/utils/geo.py
485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 |
|
network_wrangler.utils.geo.update_nodes_in_linestring_geometry ¶
Updates the nodes in a linestring geometry and returns updated geometry.
Parameters:
-
original_df
(GeoDataFrame
) –GeoDataFrame with the
model_node_id
and linestring geometry -
updated_nodes_df
(GeoDataFrame
) –GeoDataFrame with updated node geometries.
-
position
(int
) –position in the linestring to update with the node.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.update_point_geometry ¶
update_point_geometry(df, ref_point_df, lon_field='X', lat_field='Y', id_field='model_node_id', ref_lon_field='X', ref_lat_field='Y', ref_id_field='model_node_id')
Returns copy of df with lat and long fields updated with geometry from ref_point_df.
NOTE: does not update “geometry” field if it exists.
Source code in network_wrangler/utils/geo.py
network_wrangler.utils.geo.update_points_in_linestring ¶
Replaces a point in a linestring with a new point.
Parameters:
-
linestring
(LineString
) –original_linestring
-
updated_coords
(List[float]
) –updated poimt coordinates
-
position
(int
) –position in the linestring to update
Source code in network_wrangler/utils/geo.py
Time Utilities ¶
Functions related to parsing and comparing time objects and series.
Internal function terminology for timespan scopes:
matching
: a scope that could be applied for a given timespan combination. This includes the default timespan as well as scopes wholely contained within.overlapping
: a timespan that fully or partially overlaps a given timespan. This includes the default timespan, allmatching
timespans and all timespans where at least one minute overlap.conflicting
: a timespan that is overlapping but not matching. By definition default scope values are not conflicting.independent
a timespan that is not overlapping.
network_wrangler.utils.time.TimespanDfQueryError ¶
network_wrangler.utils.time.calc_overlap_duration_with_query ¶
Calculate the overlap series of start and end times and a query start and end times.
Parameters:
-
start_time_s
(Series[datetime]
) –Series of start times to calculate overlap with.
-
end_time_s
(Series[datetime]
) –Series of end times to calculate overlap with.
-
start_time_q
(datetime
) –Query start time to calculate overlap with.
-
end_time_q
(datetime
) –Query end time to calculate overlap with.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.convert_timespan_to_start_end_dt ¶
Convert a timespan string [‘12:00’,‘14:00] to start_time & end_time datetime cols in df.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.dt_contains ¶
Check timespan1 inclusively contains timespan2.
If the end time is less than the start time, it is assumed to be the next day.
Parameters:
-
timespan1
(list[time]
) –The first timespan represented as a list containing the start time and end time.
-
timespan2
(list[time]
) –The second timespan represented as a list containing the start time and end time.
Returns:
-
bool
(bool
) –True if the first timespan contains the second timespan, False otherwise.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.dt_list_overlaps ¶
Check if any of the timespans overlap.
overlapping
: a timespan that fully or partially overlaps a given timespan.
This includes and all timespans where at least one minute overlap.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.dt_overlap_duration ¶
Check if two timespans overlap and return the amount of overlap.
If the end time is less than the start time, it is assumed to be the next day.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.dt_overlaps ¶
Check if two timespans overlap.
If the end time is less than the start time, it is assumed to be the next day.
overlapping
: a timespan that fully or partially overlaps a given timespan.
This includes and all timespans where at least one minute overlap.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.dt_to_seconds_from_midnight ¶
Convert a datetime object to the number of seconds since midnight.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.duration_dt ¶
Returns a datetime.timedelta object representing the duration of the timespan.
If end_time is less than start_time, the duration will assume that it crosses over midnight.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.filter_df_to_max_overlapping_timespans ¶
filter_df_to_max_overlapping_timespans(orig_df, query_timespan, strict_match=False, min_overlap_minutes=1, keep_max_of_cols=None)
Filters dataframe for entries that have maximum overlap with the given query timespan.
If the end time is less than the start time, it is assumed to be the next day.
Parameters:
-
orig_df
(DataFrame
) –dataframe to query timespans for with
start_time
andend_time
fields. -
query_timespan
(list[TimeString]
) –TimespanString of format [‘HH:MM’,’HH:MM’] to query orig_df for overlapping records.
-
strict_match
(bool
, default:False
) –boolean indicating if the returned df should only contain records that fully contain the query timespan. If set to True, min_overlap_minutes does not apply. Defaults to False.
-
min_overlap_minutes
(int
, default:1
) –minimum number of minutes the timespans need to overlap to keep. Defaults to 1.
-
keep_max_of_cols
(Optional[list[str]]
, default:None
) –list of fields to return the maximum value of overlap for. If None, will return all overlapping time periods. Defaults to
['model_link_id']
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.filter_df_to_overlapping_timespans ¶
Filters dataframe for entries that have any overlap with ANY of the given query timespans.
If the end time is less than the start time, it is assumed to be the next day.
Parameters:
-
orig_df
(DataFrame
) –dataframe to query timespans for with
start_time
andend_time
fields. -
query_timespans
(list[TimespanString]
) –List of a list of TimespanStr of format [‘HH:MM’,’HH:MM’] to query orig_df for overlapping records.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.filter_dt_list_to_overlaps ¶
Filter a list of timespans to only include those that overlap.
overlapping
: a timespan that fully or partially overlaps a given timespan.
This includes and all timespans where at least one minute overlap.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.format_seconds_to_legible_str ¶
Formats seconds into a human-friendly string for log files.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.is_increasing ¶
Check if a list of datetime objects is increasing in time.
network_wrangler.utils.time.seconds_from_midnight_to_str ¶
Convert the number of seconds since midnight to a TimeString (HH:MM).
network_wrangler.utils.time.str_to_seconds_from_midnight ¶
Convert a TimeString (HH:MM<:SS>) to the number of seconds since midnight.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.str_to_time ¶
Convert TimeString (HH:MM<:SS>) to datetime object.
If HH > 24, will subtract 24 to be within 24 hours. Timespans will be treated as the next day.
Parameters:
-
time_str
(TimeString
) –TimeString in HH:MM:SS or HH:MM format.
-
base_date
(Optional[date]
, default:None
) –optional date to base the datetime on. Defaults to None. If not provided, will use today.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.str_to_time_list ¶
Convert list of TimeStrings (HH:MM<:SS>) to list of datetime.time objects.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.str_to_time_series ¶
Convert mixed panda series datetime and TimeString (HH:MM<:SS>) to datetime object.
If HH > 24, will subtract 24 to be within 24 hours. Timespans will be treated as the next day.
Parameters:
-
time_str_s
(Series
) –Pandas Series of TimeStrings in HH:MM:SS or HH:MM format.
-
base_date
(Optional[Union[Series, date]]
, default:None
) –optional date to base the datetime on. Defaults to None. If not provided, will use today. Can be either a single instance or a series of same length as time_str_s
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.timespan_str_list_to_dt ¶
Convert list of TimespanStrings to list of datetime.time objects.
Source code in network_wrangler/utils/time.py
network_wrangler.utils.time.timespans_overlap ¶
Check if two timespan strings overlap.
overlapping
: a timespan that fully or partially overlaps a given timespan.
This includes and all timespans where at least one minute overlap.
Source code in network_wrangler/utils/time.py
Module for time and timespan objects.
network_wrangler.time.Time ¶
Represents a time object.
This class provides methods to initialize and manipulate time objects.
Attributes:
-
datetime
(datetime
) –The underlying datetime object representing the time.
-
time_str
(str
) –The time string representation in HH:MM:SS format.
-
time_sec
(int
) –The time in seconds since midnight.
-
_raw_time_in
(TimeType
) –The raw input value used to initialize the Time object.
Source code in network_wrangler/time.py
network_wrangler.time.Time.time_sec
property
¶
Get the time in seconds since midnight.
Returns:
-
int
–The time in seconds since midnight.
network_wrangler.time.Time.time_str
property
¶
Get the time string representation.
Returns:
-
str
–The time string representation in HH:MM:SS format.
network_wrangler.time.Time.__getitem__ ¶
Get the time string representation.
Parameters:
-
item
(Any
) –Not used.
Returns:
-
str
(str
) –The time string representation in HH:MM:SS format.
network_wrangler.time.Time.__hash__ ¶
Get the hash value of the Time object.
Returns:
-
int
(int
) –The hash value of the Time object.
network_wrangler.time.Time.__init__ ¶
Initializes a Time object.
Parameters:
-
value
(TimeType
) –A time object, string in HH:MM[:SS] format, or seconds since midnight.
Raises:
-
TimeFormatError
–If the value is not a valid time format.
Source code in network_wrangler/time.py
network_wrangler.time.Time.__str__ ¶
Get the string representation of the Time object.
Returns:
-
str
(str
) –The time string representation in HH:MM:SS format.
network_wrangler.time.Timespan ¶
Timespan object.
This class provides methods to initialize and manipulate time objects.
If the end_time is less than the start_time, the duration will assume that it crosses over midnight.
Attributes:
-
start_time
(time
) –The start time of the timespan.
-
end_time
(time
) –The end time of the timespan.
-
timespan_str_list
(str
) –A list of start time and end time in HH:MM:SS format.
-
start_time_sec
(int
) –The start time in seconds since midnight.
-
end_time_sec
(int
) –The end time in seconds since midnight.
-
duration
(timedelta
) –The duration of the timespan.
-
duration_sec
(int
) –The duration of the timespan in seconds.
-
_raw_timespan_in
(Any
) –The raw input value used to initialize the Timespan object.
Source code in network_wrangler/time.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
|
network_wrangler.time.Timespan.duration
property
¶
Duration of timespan as a timedelta object.
network_wrangler.time.Timespan.duration_sec
property
¶
Duration of timespan in seconds.
If end_time is less than start_time, the duration will assume that it crosses over midnight.
network_wrangler.time.Timespan.end_time_sec
property
¶
End time in seconds since midnight.
network_wrangler.time.Timespan.start_time_sec
property
¶
Start time in seconds since midnight.
network_wrangler.time.Timespan.timespan_str_list
property
¶
Get the timespan string representation.
network_wrangler.time.Timespan.__hash__ ¶
network_wrangler.time.Timespan.__init__ ¶
Constructor for the Timespan object.
If the value is a list of two time strings, datetime objects, Time, or seconds from midnight, the start_time and end_time attributes will be set accordingly.
Parameters:
-
value
(time
) –a list of two time strings, datetime objects, Time, or seconds from midnight.
Source code in network_wrangler/time.py
network_wrangler.time.Timespan.__str__ ¶
network_wrangler.time.Timespan.overlaps ¶
Check if two timespans overlap.
If the start time is greater than the end time, the timespan is assumed to cross over midnight.
Parameters:
-
other
(Timespan
) –The other timespan to compare.
Returns:
-
bool
(bool
) –True if the two timespans overlap, False otherwise.
Source code in network_wrangler/time.py
Logging and Visualization ¶
Logging utilities for Network Wrangler.
network_wrangler.logger.setup_logging ¶
Sets up the WranglerLogger w.r.t. the debug file location and if logging to console.
Called by the test_logging fixture in conftest.py and can be called by the user to setup logging for their session. If called multiple times, the logger will be reset.
Parameters:
-
info_log_filename
(Optional[Path]
, default:None
) –the location of the log file that will get created to add the INFO log. The INFO Log is terse, just gives the bare minimum of details. Defaults to file in cwd()
wrangler_[datetime].log
. To turn off logging to a file, use log_filename = None. -
debug_log_filename
(Optional[Path]
, default:None
) –the location of the log file that will get created to add the DEBUG log The DEBUG log is very noisy, for debugging. Defaults to file in cwd()
wrangler_[datetime].log
. To turn off logging to a file, use log_filename = None. -
std_out_level
(str
, default:'info'
) –the level of logging to the console. One of “info”, “warning”, “debug”. Defaults to “info” but will be set to ERROR if nothing provided matches.
-
file_mode
(str
, default:'a'
) –use ‘a’ to append, ‘w’ to write without appending
Source code in network_wrangler/logger.py
Module for visualizing roadway and transit networks using Mapbox tiles.
This module provides a function net_to_mapbox
that creates and serves Mapbox tiles on a local web server based on roadway and transit networks.
Example usage
net_to_mapbox(roadway, transit)
network_wrangler.viz.MissingMapboxTokenError ¶
network_wrangler.viz.net_to_mapbox ¶
net_to_mapbox(roadway=None, transit=None, roadway_geojson_out=Path('roadway_shapes.geojson'), transit_geojson_out=Path('transit_shapes.geojson'), mbtiles_out=Path('network.mbtiles'), overwrite=True, port='9000')
Creates and serves mapbox tiles on local web server based on roadway and transit networks.
Parameters:
-
roadway
(Optional[Union[RoadwayNetwork, GeoDataFrame, str, Path]]
, default:None
) –a RoadwayNetwork instance, geodataframe with roadway linetrings, or path to a geojson file. Defaults to empty GeoDataFrame.
-
transit
(Optional[Union[TransitNetwork, GeoDataFrame]]
, default:None
) –a TransitNetwork instance or a geodataframe with roadway linetrings, or path to a geojson file. Defaults to empty GeoDataFrame.
-
roadway_geojson_out
(Path
, default:Path('roadway_shapes.geojson')
) –file path for roadway geojson which gets created if roadway is not a path to a geojson file. Defaults to roadway_shapes.geojson.
-
transit_geojson_out
(Path
, default:Path('transit_shapes.geojson')
) –file path for transit geojson which gets created if transit is not a path to a geojson file. Defaults to transit_shapes.geojson.
-
mbtiles_out
(Path
, default:Path('network.mbtiles')
) –path to output mapbox tiles. Defaults to network.mbtiles
-
overwrite
(bool
, default:True
) –boolean indicating if can overwrite mbtiles_out and roadway_geojson_out and transit_geojson_out. Defaults to True.
-
port
(str
, default:'9000'
) –port to serve resulting tiles on. Defaults to 9000.
Source code in network_wrangler/viz.py
Error Handling ¶
All network wrangler errors.
network_wrangler.errors.DataframeSelectionError ¶
network_wrangler.errors.FeedReadError ¶
network_wrangler.errors.FeedValidationError ¶
network_wrangler.errors.InvalidScopedLinkValue ¶
network_wrangler.errors.LinkAddError ¶
network_wrangler.errors.LinkChangeError ¶
network_wrangler.errors.LinkCreationError ¶
network_wrangler.errors.LinkDeletionError ¶
network_wrangler.errors.LinkNotFoundError ¶
network_wrangler.errors.ManagedLaneAccessEgressError ¶
network_wrangler.errors.MissingNodesError ¶
network_wrangler.errors.NewRoadwayError ¶
network_wrangler.errors.NodeAddError ¶
network_wrangler.errors.NodeChangeError ¶
network_wrangler.errors.NodeDeletionError ¶
network_wrangler.errors.NodeNotFoundError ¶
network_wrangler.errors.NodesInLinksMissingError ¶
network_wrangler.errors.NotLinksError ¶
network_wrangler.errors.NotNodesError ¶
network_wrangler.errors.ProjectCardError ¶
network_wrangler.errors.RoadwayDeletionError ¶
network_wrangler.errors.RoadwayPropertyChangeError ¶
network_wrangler.errors.ScenarioConflictError ¶
network_wrangler.errors.ScenarioCorequisiteError ¶
network_wrangler.errors.ScenarioPrerequisiteError ¶
network_wrangler.errors.ScopeConflictError ¶
network_wrangler.errors.ScopeLinkValueError ¶
network_wrangler.errors.SegmentFormatError ¶
network_wrangler.errors.SegmentSelectionError ¶
network_wrangler.errors.SelectionError ¶
network_wrangler.errors.ShapeAddError ¶
network_wrangler.errors.ShapeDeletionError ¶
network_wrangler.errors.SubnetCreationError ¶
network_wrangler.errors.SubnetExpansionError ¶
network_wrangler.errors.TimeFormatError ¶
network_wrangler.errors.TimespanFormatError ¶
network_wrangler.errors.TransitPropertyChangeError ¶
network_wrangler.errors.TransitRoadwayConsistencyError ¶
network_wrangler.errors.TransitRouteAddError ¶
network_wrangler.errors.TransitRoutingChangeError ¶
network_wrangler.errors.TransitSelectionEmptyError ¶
network_wrangler.errors.TransitSelectionError ¶
network_wrangler.errors.TransitSelectionNetworkConsistencyError ¶
Bases: TransitSelectionError
Error for when transit selection dictionary is not consistent with transit network.