Skip to content

Diagnostics

processing.weighting.diagnostics

Diagnostics sub-package — HTML report generation for weighting results.

Produces a self-contained interactive HTML report (Plotly + Jinja2, no external dependencies) with the following sections:

  1. Crosswalk Map — geographic crosswalk visualization.
  2. Convergence & Weight Summary — per-zone convergence status, weight sums, ESS%, and CV.
  3. Target Fit — per-zone fit metrics (HH/person targets, MAPE, P90, Max error).
  4. Weight Distribution — violin / jitter plots of final_weight / base_weight per zone, with summary statistics.
  5. Target Fit (% Error) — diverging bar charts per zone.
  6. Unweighted Cell Counts (Data Sparsity) — seed counts per control category per zone.

  7. Expansion Factor Calibration — MAPE vs CV across a grid of max_expansion_factor values. Enabled by setting expansion_factor_grid in the weighting config.

Configuration (YAML)

diagnostics:
  output_path: "{{ output_dir }}/weighting_diagnostics.html"

When output_path is omitted the report is, written to <cache_dir>/diagnostics.html.

processing.weighting.diagnostics.report

Report orchestration: assemble sections and render the Jinja2 template.

Entry point is generate_report, which collects data from the balancer run, builds Plotly figures and HTML tables via the sibling modules (charts, data, tables), then renders everything into a single .html file using a bundled Jinja2 template.

generate_report

generate_report(
    seed: pl.DataFrame,
    weights: pl.DataFrame,
    control_totals: ControlTotals,
    target_names: list[str],
    statuses: list[ZoneStatus],
    output_path: Path,
    *,
    puma_gdf: GeoDataFrame | None = None,
    target_gdf: GeoDataFrame | None = None,
    crosswalk_df: pl.DataFrame | None = None,
    zone_groups: dict[str, list[str]] | None = None,
    merge_specs: list | None = None,
    grid_results: list[GridPoint] | None = None,
    selected_ef: float | None = None,
    control_moe: pl.DataFrame | None = None,
    imputation_summary: list[ImputationSummary] | None = None,
    pums_incidence: pl.DataFrame | None = None,
    pre_imputation_incidence: pl.DataFrame | None = None
) -> Path

Write the self-contained HTML diagnostics report to output_path.

processing.weighting.diagnostics.charts

Plotly chart builders for the diagnostics report.

fit_diverging_figure

fit_diverging_figure(fit: pl.DataFrame) -> go.Figure

Grid of horizontal diverging bar charts (% error, one panel per zone + overall).

Expects fit to contain a label column (added by apply_fit_merges). Null placeholder rows are rendered as invisible bars so the y-axis remains consistent across panels.

When moe_pct is present (from PUMS replicate weights), horizontal error bars show the sampling margin of error on each target.

violins_figure

violins_figure(weighted: pl.DataFrame) -> go.Figure

Violin plot of hh_weight by zone (log scale).

ef_tradeoff_figure

ef_tradeoff_figure(
    grid_results: list, selected_ef: float
) -> go.Figure

Small-multiples chart: four stacked subplots sharing the x-axis.

Hovering at an EF value on any subplot shows aligned tooltips on all four panels via hovermode="x unified" and spike lines.

Parameters:

Name Type Description Default
grid_results list

One entry per expansion-factor value with aggregate metrics.

required
selected_ef float

The user's chosen max_expansion_factor — shown as a vertical dashed marker line on every panel.

required

crosswalk_figure

crosswalk_figure(
    puma_gdf: gpd.GeoDataFrame,
    target_gdf: gpd.GeoDataFrame,
    crosswalk_df: pl.DataFrame,
    households: pl.DataFrame | None = None,
    zone_groups: dict[str, list[str]] | None = None,
) -> go.Figure

Build an interactive Plotly map of the crosswalk.

Layers: - PUMA boundaries (dashed grey) — full extent - Study area outline (bold black) - Target zones (solid border, transparent fill) with tooltip showing PUMA allocation weights from the crosswalk.

Parameters:

Name Type Description Default
puma_gdf gpd.GeoDataFrame

PUMA boundary polygons (must have puma_id column).

required
target_gdf gpd.GeoDataFrame

Target zone polygons (must have study_geoid column).

required
crosswalk_df pl.DataFrame

Crosswalk table with puma_id, study_geoid, population, allocation_weight.

required
households pl.DataFrame | None

Assigned households (must contain study_geoid). When provided, per-zone sample counts appear in the tooltip.

None
zone_groups dict[str, list[str]] | None

Optional zone group mapping. When provided, grouped zones share a fill colour and labels include the group name.

None

Returns:

Type Description
go.Figure

go.Figure

processing.weighting.diagnostics.data

Data transformations for the diagnostics report.

category_label_map

category_label_map(
    target_names: list[str], merges: list[MergeSpec] | None = None
) -> dict[tuple[str, str], str]

Map (control_name, category_str) to a human-readable label.

Categories are string member names (e.g. "size_1"). Merged categories (e.g. "size_4_plus") get a title-cased label from their merge spec.

apply_fit_merges

apply_fit_merges(
    fit: pl.DataFrame, merges: list | None, target_names: list[str]
) -> pl.DataFrame

Add human-readable label column to the fit table.

With category merges already applied at the data level (both incidence tables and control totals), the fit table already reflects the correct merged/unmerged categories per zone. This function only adds labels and pads missing rows.

zone_fit_summary

zone_fit_summary(
    fit: pl.DataFrame, target_names: list[str]
) -> pl.DataFrame

Per-zone summary: HH/Person pop target & weighted, %Err, MAPE.

Population totals are derived by summing categories of one representative control at each level (any control's categories partition the population).

Returns columns: geo_id, hh_target, hh_weighted, hh_pct_err, per_target, per_weighted, per_pct_err, mape.

compute_weighted_totals

compute_weighted_totals(
    seed: pl.DataFrame,
    weights: pl.DataFrame,
    target_names: list[str],
) -> pl.DataFrame

Weighted totals per (geo_id, control_name, category).

Uses uniform column handling for all controls: - Structural controls: unpivoted column (e.g., h_total) - Non-structural controls: pivoted columns (e.g., h_size__size_1)

fit_table

fit_table(
    control_totals: ControlTotals, weighted_totals: pl.DataFrame
) -> pl.DataFrame

Join targets to weighted totals; add diff and diff_pct columns.

processing.weighting.diagnostics.tables

HTML table builders for the diagnostics report.

This module generates the four main tables displayed in the diagnostics HTML report:

  1. Balancer Performance Table — Per-zone convergence status, target-fit metrics (MAPE, P90, Max), and weight quality metrics (CV, ESS%). Combines all key performance indicators into a single comprehensive table.

  2. Weight Quality Table — Per-zone weight distribution statistics (mean, median, std, min, max) and expansion factor stats (min/max/mean/median EF ratio).

  3. Unweighted Cell Counts — Data sparsity matrix showing unweighted sample counts per control category per zone, with optional PUMS-weighted percentages for context.

  4. Crosswalk Summary Table — Zone → HH samples mapping with optional zone group aggregation.

All table builders return raw HTML strings suitable for Jinja2 template insertion. The _html_table() helper provides a consistent interface for simple tables and supports grouped/spanned headers for complex layouts.

balancer_performance_table

balancer_performance_table(
    statuses: list[ZoneStatus],
    weighted: pl.DataFrame,
    zone_fit: pl.DataFrame,
) -> str

Generate the main balancer performance table (Section 2 of diagnostics report).

Combines three categories of per-zone metrics into a single comprehensive table:

  • Convergence: Did the balancer converge? How many iterations?
  • Target Fit: How well do weighted totals match PUMS targets? (MAPE, P90, Max)
  • Weight Quality: How stable/dispersed are the weights? (CV, ESS%)

Parameters:

Name Type Description Default
statuses list[ZoneStatus]

Per-zone convergence results from the balancer.

required
weighted pl.DataFrame

Household seed joined with final weights (must include ctrl_geoid, hh_weight, base_weight columns).

required
zone_fit pl.DataFrame

Zone-level target fit summary (output of zone_fit_summary()).

required

Returns:

Type Description
str

HTML table with 13 columns: Zone, N, Conv?, Iter, Household (Target, % Error),

str

Person (Target, % Error), MAPE, P90, Max, CV, ESS%.

Note

Uses _html_table() with a two-tier grouped header. The "Household" and "Person" columns span their respective Target/% Error sub-columns.

weight_quality_table

weight_quality_table(weighted: pl.DataFrame) -> str

Generate the weight quality table (Section 3 of diagnostics report).

Shows per-zone and total weight distribution statistics (mean, median, std, min, max) and expansion factor ratios (min/max/mean/median EF). This table complements the violin plot that follows it in the report.

Parameters:

Name Type Description Default
weighted pl.DataFrame

Household seed joined with final weights (must include ctrl_geoid, hh_weight, base_weight columns).

required

Returns:

Type Description
str

HTML table with 11 columns: Zone, N, Mean, Median, Std, Min, Max, Min EF,

str

Max EF, Mean EF, Median EF, plus a TOTAL row aggregating across zones.

Note

CV and ESS% were removed from this table in March 2026 and moved to the balancer performance table for a unified view of all key metrics.

unweighted_cell_counts

unweighted_cell_counts(
    seed: pl.DataFrame,
    target_names: list[str],
    control_totals: ControlTotals | None = None,
    merge_specs: list | None = None,
) -> str

Single matrix table: categories (rows) x zones (columns).

Row headers are grouped by control name using <th rowspan>. A level separator row (Household / Person) divides the two groups.

When control_totals is provided, each cell also shows the PUMS-weighted percentage in italic parentheses so the reader can compare survey representation against the PUMS universe.

Uses uniform column handling for all controls (structural unpivoted, non-structural pivoted).

crosswalk_summary_table

crosswalk_summary_table(
    crosswalk_df: pl.DataFrame, seed: pl.DataFrame
) -> str

Compact Zone -> HH Samples table with optional Zone Group column.