Diagnostics
processing.weighting.diagnostics
Diagnostics sub-package — HTML report generation for weighting results.
Produces a self-contained interactive HTML report (Plotly + Jinja2, no external dependencies) with the following sections:
- Crosswalk Map — geographic crosswalk visualization.
- Convergence & Weight Summary — per-zone convergence status, weight sums, ESS%, and CV.
- Target Fit — per-zone fit metrics (HH/person targets, MAPE, P90, Max error).
- Weight Distribution — violin / jitter plots of
final_weight / base_weightper zone, with summary statistics. - Target Fit (% Error) — diverging bar charts per zone.
-
Unweighted Cell Counts (Data Sparsity) — seed counts per control category per zone.
-
Expansion Factor Calibration — MAPE vs CV across a grid of
max_expansion_factorvalues. Enabled by settingexpansion_factor_gridin the weighting config.
Configuration (YAML)
diagnostics:
output_path: "{{ output_dir }}/weighting_diagnostics.html"
When output_path is omitted the report is, written to <cache_dir>/diagnostics.html.
processing.weighting.diagnostics.report
Report orchestration: assemble sections and render the Jinja2 template.
Entry point is generate_report,
which collects data from the balancer run, builds Plotly figures and HTML tables via the sibling
modules (charts, data, tables), then renders everything
into a single .html file using a bundled Jinja2 template.
generate_report
generate_report(
seed: pl.DataFrame,
weights: pl.DataFrame,
control_totals: ControlTotals,
target_names: list[str],
statuses: list[ZoneStatus],
output_path: Path,
*,
puma_gdf: GeoDataFrame | None = None,
target_gdf: GeoDataFrame | None = None,
crosswalk_df: pl.DataFrame | None = None,
zone_groups: dict[str, list[str]] | None = None,
merge_specs: list | None = None,
grid_results: list[GridPoint] | None = None,
selected_ef: float | None = None,
control_moe: pl.DataFrame | None = None,
imputation_summary: list[ImputationSummary] | None = None,
pums_incidence: pl.DataFrame | None = None,
pre_imputation_incidence: pl.DataFrame | None = None
) -> Path
Write the self-contained HTML diagnostics report to output_path.
processing.weighting.diagnostics.charts
Plotly chart builders for the diagnostics report.
fit_diverging_figure
fit_diverging_figure(fit: pl.DataFrame) -> go.Figure
Grid of horizontal diverging bar charts (% error, one panel per zone + overall).
Expects fit to contain a label column (added by
apply_fit_merges).
Null placeholder rows are rendered as invisible bars
so the y-axis remains consistent across panels.
When moe_pct is present (from PUMS replicate weights), horizontal
error bars show the sampling margin of error on each target.
violins_figure
violins_figure(weighted: pl.DataFrame) -> go.Figure
Violin plot of hh_weight by zone (log scale).
ef_tradeoff_figure
ef_tradeoff_figure(
grid_results: list, selected_ef: float
) -> go.Figure
Small-multiples chart: four stacked subplots sharing the x-axis.
Hovering at an EF value on any subplot shows aligned tooltips on
all four panels via hovermode="x unified" and spike lines.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_results
|
list
|
One entry per expansion-factor value with aggregate metrics. |
required |
selected_ef
|
float
|
The user's chosen |
required |
crosswalk_figure
crosswalk_figure(
puma_gdf: gpd.GeoDataFrame,
target_gdf: gpd.GeoDataFrame,
crosswalk_df: pl.DataFrame,
households: pl.DataFrame | None = None,
zone_groups: dict[str, list[str]] | None = None,
) -> go.Figure
Build an interactive Plotly map of the crosswalk.
Layers: - PUMA boundaries (dashed grey) — full extent - Study area outline (bold black) - Target zones (solid border, transparent fill) with tooltip showing PUMA allocation weights from the crosswalk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
puma_gdf
|
gpd.GeoDataFrame
|
PUMA boundary polygons (must have |
required |
target_gdf
|
gpd.GeoDataFrame
|
Target zone polygons (must have |
required |
crosswalk_df
|
pl.DataFrame
|
Crosswalk table with |
required |
households
|
pl.DataFrame | None
|
Assigned households (must contain |
None
|
zone_groups
|
dict[str, list[str]] | None
|
Optional zone group mapping. When provided, grouped zones share a fill colour and labels include the group name. |
None
|
Returns:
| Type | Description |
|---|---|
go.Figure
|
go.Figure |
processing.weighting.diagnostics.data
Data transformations for the diagnostics report.
category_label_map
category_label_map(
target_names: list[str], merges: list[MergeSpec] | None = None
) -> dict[tuple[str, str], str]
Map (control_name, category_str) to a human-readable label.
Categories are string member names (e.g. "size_1"). Merged
categories (e.g. "size_4_plus") get a title-cased label from
their merge spec.
apply_fit_merges
apply_fit_merges(
fit: pl.DataFrame, merges: list | None, target_names: list[str]
) -> pl.DataFrame
Add human-readable label column to the fit table.
With category merges already applied at the data level (both incidence tables and control totals), the fit table already reflects the correct merged/unmerged categories per zone. This function only adds labels and pads missing rows.
zone_fit_summary
zone_fit_summary(
fit: pl.DataFrame, target_names: list[str]
) -> pl.DataFrame
Per-zone summary: HH/Person pop target & weighted, %Err, MAPE.
Population totals are derived by summing categories of one representative control at each level (any control's categories partition the population).
Returns columns: geo_id, hh_target, hh_weighted, hh_pct_err, per_target, per_weighted, per_pct_err, mape.
compute_weighted_totals
compute_weighted_totals(
seed: pl.DataFrame,
weights: pl.DataFrame,
target_names: list[str],
) -> pl.DataFrame
Weighted totals per (geo_id, control_name, category).
Uses uniform column handling for all controls:
- Structural controls: unpivoted column (e.g., h_total)
- Non-structural controls: pivoted columns (e.g., h_size__size_1)
fit_table
fit_table(
control_totals: ControlTotals, weighted_totals: pl.DataFrame
) -> pl.DataFrame
Join targets to weighted totals; add diff and diff_pct columns.
processing.weighting.diagnostics.tables
HTML table builders for the diagnostics report.
This module generates the four main tables displayed in the diagnostics HTML report:
-
Balancer Performance Table — Per-zone convergence status, target-fit metrics (MAPE, P90, Max), and weight quality metrics (CV, ESS%). Combines all key performance indicators into a single comprehensive table.
-
Weight Quality Table — Per-zone weight distribution statistics (mean, median, std, min, max) and expansion factor stats (min/max/mean/median EF ratio).
-
Unweighted Cell Counts — Data sparsity matrix showing unweighted sample counts per control category per zone, with optional PUMS-weighted percentages for context.
-
Crosswalk Summary Table — Zone → HH samples mapping with optional zone group aggregation.
All table builders return raw HTML strings suitable for Jinja2 template insertion.
The _html_table() helper provides a consistent interface for simple tables and
supports grouped/spanned headers for complex layouts.
balancer_performance_table
balancer_performance_table(
statuses: list[ZoneStatus],
weighted: pl.DataFrame,
zone_fit: pl.DataFrame,
) -> str
Generate the main balancer performance table (Section 2 of diagnostics report).
Combines three categories of per-zone metrics into a single comprehensive table:
- Convergence: Did the balancer converge? How many iterations?
- Target Fit: How well do weighted totals match PUMS targets? (MAPE, P90, Max)
- Weight Quality: How stable/dispersed are the weights? (CV, ESS%)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
statuses
|
list[ZoneStatus]
|
Per-zone convergence results from the balancer. |
required |
weighted
|
pl.DataFrame
|
Household seed joined with final weights (must include |
required |
zone_fit
|
pl.DataFrame
|
Zone-level target fit summary (output of |
required |
Returns:
| Type | Description |
|---|---|
str
|
HTML table with 13 columns: Zone, N, Conv?, Iter, Household (Target, % Error), |
str
|
Person (Target, % Error), MAPE, P90, Max, CV, ESS%. |
Note
Uses _html_table() with a two-tier grouped header. The "Household" and
"Person" columns span their respective Target/% Error sub-columns.
weight_quality_table
weight_quality_table(weighted: pl.DataFrame) -> str
Generate the weight quality table (Section 3 of diagnostics report).
Shows per-zone and total weight distribution statistics (mean, median, std, min, max) and expansion factor ratios (min/max/mean/median EF). This table complements the violin plot that follows it in the report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weighted
|
pl.DataFrame
|
Household seed joined with final weights (must include |
required |
Returns:
| Type | Description |
|---|---|
str
|
HTML table with 11 columns: Zone, N, Mean, Median, Std, Min, Max, Min EF, |
str
|
Max EF, Mean EF, Median EF, plus a TOTAL row aggregating across zones. |
Note
CV and ESS% were removed from this table in March 2026 and moved to the balancer performance table for a unified view of all key metrics.
unweighted_cell_counts
unweighted_cell_counts(
seed: pl.DataFrame,
target_names: list[str],
control_totals: ControlTotals | None = None,
merge_specs: list | None = None,
) -> str
Single matrix table: categories (rows) x zones (columns).
Row headers are grouped by control name using <th rowspan>.
A level separator row (Household / Person) divides the two groups.
When control_totals is provided, each cell also shows the PUMS-weighted percentage in italic parentheses so the reader can compare survey representation against the PUMS universe.
Uses uniform column handling for all controls (structural unpivoted, non-structural pivoted).
crosswalk_summary_table
crosswalk_summary_table(
crosswalk_df: pl.DataFrame, seed: pl.DataFrame
) -> str
Compact Zone -> HH Samples table with optional Zone Group column.