Diagnostics

processing.weighting.diagnostics

Diagnostics sub-package — HTML report generation for weighting results.

Produces a self-contained interactive HTML report (Plotly + Jinja2, no external dependencies) with the following sections:

Crosswalk Map — geographic crosswalk visualization.
Convergence & Weight Summary — per-zone convergence status, weight sums, ESS%, and CV.
Target Fit — per-zone fit metrics (HH/person targets, MAPE, P90, Max error).
Weight Distribution — violin / jitter plots of final_weight / base_weight per zone, with summary statistics.
Target Fit (% Error) — diverging bar charts per zone.
Unweighted Cell Counts (Data Sparsity) — seed counts per control category per zone.
Expansion Factor Calibration — MAPE vs CV across a grid of max_expansion_factor values. Enabled by setting expansion_factor_grid in the weighting config.

Configuration (YAML)

diagnostics:
  output_path: "{{ output_dir }}/weighting_diagnostics.html"

When output_path is omitted the report is, written to <cache_dir>/diagnostics.html.

processing.weighting.diagnostics.report

Report orchestration: assemble sections and render the Jinja2 template.

Entry point is generate_report, which collects data from the balancer run, builds Plotly figures and HTML tables via the sibling modules (charts, data, tables), then renders everything into a single .html file using a bundled Jinja2 template.

generate_report

generate_report(
    seed: pl.DataFrame,
    weights: pl.DataFrame,
    control_totals: ControlTotals,
    target_names: list[str],
    statuses: list[ZoneStatus],
    output_path: Path,
    *,
    puma_gdf: GeoDataFrame | None = None,
    target_gdf: GeoDataFrame | None = None,
    crosswalk_df: pl.DataFrame | None = None,
    zone_groups: dict[str, list[str]] | None = None,
    merge_specs: list | None = None,
    grid_results: list[GridPoint] | None = None,
    selected_ef: float | None = None,
    control_moe: pl.DataFrame | None = None,
    imputation_summary: list[ImputationSummary] | None = None,
    pums_incidence: pl.DataFrame | None = None,
    pre_imputation_incidence: pl.DataFrame | None = None
) -> Path

Write the self-contained HTML diagnostics report to output_path.

processing.weighting.diagnostics.charts

Plotly chart builders for the diagnostics report.

fit_diverging_figure

fit_diverging_figure(fit: pl.DataFrame) -> go.Figure

Grid of horizontal diverging bar charts (% error, one panel per zone + overall).

Expects fit to contain a label column (added by apply_fit_merges). Null placeholder rows are rendered as invisible bars so the y-axis remains consistent across panels.

When moe_pct is present (from PUMS replicate weights), horizontal error bars show the sampling margin of error on each target.

violins_figure

violins_figure(weighted: pl.DataFrame) -> go.Figure

Violin plot of hh_weight by zone (log scale).

ef_tradeoff_figure

ef_tradeoff_figure(
    grid_results: list, selected_ef: float
) -> go.Figure

Small-multiples chart: four stacked subplots sharing the x-axis.

Hovering at an EF value on any subplot shows aligned tooltips on all four panels via hovermode="x unified" and spike lines.

Parameters:

Name	Type	Description	Default
`grid_results`	`list`	One entry per expansion-factor value with aggregate metrics.	required
`selected_ef`	`float`	The user's chosen `max_expansion_factor` — shown as a vertical dashed marker line on every panel.	required

crosswalk_figure

crosswalk_figure(
    puma_gdf: gpd.GeoDataFrame,
    target_gdf: gpd.GeoDataFrame,
    crosswalk_df: pl.DataFrame,
    households: pl.DataFrame | None = None,
    zone_groups: dict[str, list[str]] | None = None,
) -> go.Figure

Build an interactive Plotly map of the crosswalk.

Layers: - PUMA boundaries (dashed grey) — full extent - Study area outline (bold black) - Target zones (solid border, transparent fill) with tooltip showing PUMA allocation weights from the crosswalk.

Parameters:

Name	Type	Description	Default
`puma_gdf`	`gpd.GeoDataFrame`	PUMA boundary polygons (must have `puma_id` column).	required
`target_gdf`	`gpd.GeoDataFrame`	Target zone polygons (must have `study_geoid` column).	required
`crosswalk_df`	`pl.DataFrame`	Crosswalk table with `puma_id`, `study_geoid`, `population`, `allocation_weight`.	required
`households`	`pl.DataFrame \| None`	Assigned households (must contain `study_geoid`). When provided, per-zone sample counts appear in the tooltip.	`None`
`zone_groups`	`dict[str, list[str]] \| None`	Optional zone group mapping. When provided, grouped zones share a fill colour and labels include the group name.	`None`

Returns:

Type	Description
`go.Figure`	go.Figure

processing.weighting.diagnostics.data

Data transformations for the diagnostics report.

category_label_map

category_label_map(
    target_names: list[str], merges: list[MergeSpec] | None = None
) -> dict[tuple[str, str], str]

Map (control_name, category_str) to a human-readable label.

Categories are string member names (e.g. "size_1"). Merged categories (e.g. "size_4_plus") get a title-cased label from their merge spec.

apply_fit_merges

apply_fit_merges(
    fit: pl.DataFrame, merges: list | None, target_names: list[str]
) -> pl.DataFrame

Add human-readable label column to the fit table.

With category merges already applied at the data level (both incidence tables and control totals), the fit table already reflects the correct merged/unmerged categories per zone. This function only adds labels and pads missing rows.

zone_fit_summary

zone_fit_summary(
    fit: pl.DataFrame, target_names: list[str]
) -> pl.DataFrame

Per-zone summary: HH/Person pop target & weighted, %Err, MAPE.

Population totals are derived by summing categories of one representative control at each level (any control's categories partition the population).

Returns columns: geo_id, hh_target, hh_weighted, hh_pct_err, per_target, per_weighted, per_pct_err, mape.

compute_weighted_totals

compute_weighted_totals(
    seed: pl.DataFrame,
    weights: pl.DataFrame,
    target_names: list[str],
) -> pl.DataFrame

Weighted totals per (geo_id, control_name, category).

Uses uniform column handling for all controls: - Structural controls: unpivoted column (e.g., h_total) - Non-structural controls: pivoted columns (e.g., h_size__size_1)

fit_table

fit_table(
    control_totals: ControlTotals, weighted_totals: pl.DataFrame
) -> pl.DataFrame

Join targets to weighted totals; add diff and diff_pct columns.

processing.weighting.diagnostics.tables

HTML table builders for the diagnostics report.

This module generates the four main tables displayed in the diagnostics HTML report:

Balancer Performance Table — Per-zone convergence status, target-fit metrics (MAPE, P90, Max), and weight quality metrics (CV, ESS%). Combines all key performance indicators into a single comprehensive table.
Weight Quality Table — Per-zone weight distribution statistics (mean, median, std, min, max) and expansion factor stats (min/max/mean/median EF ratio).
Unweighted Cell Counts — Data sparsity matrix showing unweighted sample counts per control category per zone, with optional PUMS-weighted percentages for context.
Crosswalk Summary Table — Zone → HH samples mapping with optional zone group aggregation.

All table builders return raw HTML strings suitable for Jinja2 template insertion. The _html_table() helper provides a consistent interface for simple tables and supports grouped/spanned headers for complex layouts.

balancer_performance_table

balancer_performance_table(
    statuses: list[ZoneStatus],
    weighted: pl.DataFrame,
    zone_fit: pl.DataFrame,
) -> str

Generate the main balancer performance table (Section 2 of diagnostics report).

Combines three categories of per-zone metrics into a single comprehensive table:

Convergence: Did the balancer converge? How many iterations?
Target Fit: How well do weighted totals match PUMS targets? (MAPE, P90, Max)
Weight Quality: How stable/dispersed are the weights? (CV, ESS%)

Parameters:

Name	Type	Description	Default
`statuses`	`list[ZoneStatus]`	Per-zone convergence results from the balancer.	required
`weighted`	`pl.DataFrame`	Household seed joined with final weights (must include `ctrl_geoid`, `hh_weight`, `base_weight` columns).	required
`zone_fit`	`pl.DataFrame`	Zone-level target fit summary (output of `zone_fit_summary()`).	required

Returns:

Type	Description
`str`	HTML table with 13 columns: Zone, N, Conv?, Iter, Household (Target, % Error),
`str`	Person (Target, % Error), MAPE, P90, Max, CV, ESS%.

Note

Uses _html_table() with a two-tier grouped header. The "Household" and "Person" columns span their respective Target/% Error sub-columns.

weight_quality_table

weight_quality_table(weighted: pl.DataFrame) -> str

Generate the weight quality table (Section 3 of diagnostics report).

Shows per-zone and total weight distribution statistics (mean, median, std, min, max) and expansion factor ratios (min/max/mean/median EF). This table complements the violin plot that follows it in the report.

Parameters:

Name	Type	Description	Default
`weighted`	`pl.DataFrame`	Household seed joined with final weights (must include `ctrl_geoid`, `hh_weight`, `base_weight` columns).	required

Returns:

Type	Description
`str`	HTML table with 11 columns: Zone, N, Mean, Median, Std, Min, Max, Min EF,
`str`	Max EF, Mean EF, Median EF, plus a TOTAL row aggregating across zones.

Note

CV and ESS% were removed from this table in March 2026 and moved to the balancer performance table for a unified view of all key metrics.

unweighted_cell_counts

unweighted_cell_counts(
    seed: pl.DataFrame,
    target_names: list[str],
    control_totals: ControlTotals | None = None,
    merge_specs: list | None = None,
) -> str

Single matrix table: categories (rows) x zones (columns).

Row headers are grouped by control name using <th rowspan>. A level separator row (Household / Person) divides the two groups.

When control_totals is provided, each cell also shows the PUMS-weighted percentage in italic parentheses so the reader can compare survey representation against the PUMS universe.

Uses uniform column handling for all controls (structural unpivoted, non-structural pivoted).

crosswalk_summary_table

crosswalk_summary_table(
    crosswalk_df: pl.DataFrame, seed: pl.DataFrame
) -> str

Compact Zone -> HH Samples table with optional Zone Group column.