Skip to content

Controls

processing.weighting.controls

Weighting control definitions and registry.

Provides the CONTROLS registry mapping control names to ControlTarget instances, plus household-level and person-level concrete control classes that define pums_expr / survey_expr recoding logic.

processing.weighting.controls.base

Base class and helpers for weighting controls.

Contains ControlLevel, ControlTarget base class, and shared expression helpers used by the household and person control subclasses.

ControlLevel

Whether a control is at the household or person level.

HOUSEHOLD class-attribute instance-attribute

HOUSEHOLD = 'household'

PERSON class-attribute instance-attribute

PERSON = 'person'

ControlTarget

Base class for a single weighting control.

Subclasses set class attributes and override survey_expr / pums_expr to return native Polars expressions.

Attributes:

Name Type Description
name str

Registry key, e.g. "h_size".

level ControlLevel

HOUSEHOLD or PERSON.

description str

Human-readable label.

categories type[Enum]

IntEnum or LabeledEnum for output bins.

survey_fields tuple[str, ...]

Canonical survey column names (metadata).

pums_fields tuple[str, ...]

PUMS column names (metadata).

processing.weighting.controls.registry

Control registry and resolution helpers.

The CONTROLS dict is the single lookup table mapping control names to :class:ControlTarget instances. resolve_targets and pums_variables provide the main query API used by the rest of the weighting pipeline.

Dynamic cross-tab creation: register_crosstab creates a CrosstabControlTarget instance at runtime from dimension control names, allowing cross-tabs to be defined in YAML config without requiring Python class definitions.

resolve_targets

resolve_targets(
    targets: list[str], level: ControlLevel | None = None
) -> list[ControlTarget]

Return ControlTarget objects for targets, optionally filtered.

pums_variables

pums_variables(level: ControlLevel) -> set[str]

PUMS variable names needed for all controls at level.

processing.weighting.controls.enums

Custom IntEnum categories for weighting controls.

These define the output bins for the 8 collapsing controls where the canonical survey granularity is reduced to fewer bins for weighting. The 5 identity controls (income, education, race, ethnicity, age) reuse canonical LabeledEnum directly and are not duplicated here.

HHSizeCategory

Household size bins (1-9, 10+).

SIZE_1 class-attribute instance-attribute

SIZE_1 = 1

SIZE_2 class-attribute instance-attribute

SIZE_2 = 2

SIZE_3 class-attribute instance-attribute

SIZE_3 = 3

SIZE_4 class-attribute instance-attribute

SIZE_4 = 4

SIZE_5 class-attribute instance-attribute

SIZE_5 = 5

SIZE_6 class-attribute instance-attribute

SIZE_6 = 6

SIZE_7 class-attribute instance-attribute

SIZE_7 = 7

SIZE_8 class-attribute instance-attribute

SIZE_8 = 8

SIZE_9 class-attribute instance-attribute

SIZE_9 = 9

SIZE_10_PLUS class-attribute instance-attribute

SIZE_10_PLUS = 10

HHWorkersCategory

Number of workers in household (0-4, 5+).

WORKERS_0 class-attribute instance-attribute

WORKERS_0 = 0

WORKERS_1 class-attribute instance-attribute

WORKERS_1 = 1

WORKERS_2 class-attribute instance-attribute

WORKERS_2 = 2

WORKERS_3 class-attribute instance-attribute

WORKERS_3 = 3

WORKERS_4 class-attribute instance-attribute

WORKERS_4 = 4

WORKERS_5_PLUS class-attribute instance-attribute

WORKERS_5_PLUS = 5

HHVehiclesCategory

Vehicles available to household (0-5, 6+).

VEH_0 class-attribute instance-attribute

VEH_0 = 0

VEH_1 class-attribute instance-attribute

VEH_1 = 1

VEH_2 class-attribute instance-attribute

VEH_2 = 2

VEH_3 class-attribute instance-attribute

VEH_3 = 3

VEH_4 class-attribute instance-attribute

VEH_4 = 4

VEH_5 class-attribute instance-attribute

VEH_5 = 5

VEH_6_PLUS class-attribute instance-attribute

VEH_6_PLUS = 6

HHChildrenCategory

Number of children in household (0-4, 5+).

CHILDREN_0 class-attribute instance-attribute

CHILDREN_0 = 0

CHILDREN_1 class-attribute instance-attribute

CHILDREN_1 = 1

CHILDREN_2 class-attribute instance-attribute

CHILDREN_2 = 2

CHILDREN_3 class-attribute instance-attribute

CHILDREN_3 = 3

CHILDREN_4 class-attribute instance-attribute

CHILDREN_4 = 4

CHILDREN_5_PLUS class-attribute instance-attribute

CHILDREN_5_PLUS = 5

GenderCategory

Gender bins for weighting (male / female / other).

PUMS only has binary SEX.

MALE class-attribute instance-attribute

MALE = 1

FEMALE class-attribute instance-attribute

FEMALE = 2

EmploymentCategory

Employment status for weighting (full / part / not employed).

EMPLOYED_FULL class-attribute instance-attribute

EMPLOYED_FULL = 1

EMPLOYED_PART class-attribute instance-attribute

EMPLOYED_PART = 2

NOT_EMPLOYED class-attribute instance-attribute

NOT_EMPLOYED = 3

CommuteModeCategory

Commute mode for weighting.

NA class-attribute instance-attribute

NA = 0

MOSTLY_REMOTE class-attribute instance-attribute

MOSTLY_REMOTE = 1

DRIVE_ALONE class-attribute instance-attribute

DRIVE_ALONE = 2

CARPOOL class-attribute instance-attribute

CARPOOL = 3

TRANSIT class-attribute instance-attribute

TRANSIT = 4

WALK class-attribute instance-attribute

WALK = 5

BIKE class-attribute instance-attribute

BIKE = 6

OTHER class-attribute instance-attribute

OTHER = 7

StudentCategory

Student status (not student / K-12 / college).

NOT_STUDENT class-attribute instance-attribute

NOT_STUDENT = 0

STUDENT_K12 class-attribute instance-attribute

STUDENT_K12 = 1

STUDENT_COLLEGE class-attribute instance-attribute

STUDENT_COLLEGE = 2

TotalCategory

Single-category enum for h_total / p_total structural controls.

TOTAL class-attribute instance-attribute

TOTAL = 1

processing.weighting.controls.household

Household-level weighting controls.

Each class maps raw survey / PUMS values into coarser category ints for household-level weighting targets.

All survey_expr / pums_expr overrides implement the interface documented in ControlTarget — individual method docstrings are omitted for brevity (ruff noqa: D102).

HHSizeControl

Household size (1-10+).

HHIncomeControl

Household income (canonical IncomeBroad bins).

HHWorkersControl

Number of workers in household (0-5+).

HHVehiclesControl

Vehicles in household (0-6+).

HHChildrenControl

Children in household (0-5+).

HHTotalControl

Structural control: total households (incidence = 1 per HH).

processing.weighting.controls.person

Person-level weighting controls.

Each class maps raw survey / PUMS values into coarser category ints for person-level weighting targets.

All survey_expr / pums_expr overrides implement the interface documented in ControlTarget — individual method docstrings are omitted for brevity (ruff noqa: D102) because the base class fully documents the expected behavior and error handling.

GenderControl

Gender (male / female).

EmploymentControl

Employment status (full-time / part-time / not employed).

CommuteModeControl

Commute mode (drive, carpool, transit, bike, walk, mostly_remote, other, N/A).

Survey side: Uses a combination of job_type, telework_freq, and commute_freq to identify mostly-remote workers — those whose telework frequency exceeds their commute frequency. For all other workers the observed work_mode determines the category.

PUMS side: JWTRNS=11 ("Worked at home") maps to MOSTLY_REMOTE. This is the closest analog — the PUMS question asks about the usual mode to work, so respondents who mostly remote-work select this.

StudentControl

Student status (K-12 / college / not a student).

Classification priority
  1. Explicit non-student (student == NONSTUDENT) → NOT_STUDENT
  2. Known K-12 school type (preschool thru high school) → K12
  3. Known college school type → COLLEGE
  4. Childcare / at-home (not school in the Census sense) → NOT_STUDENT
  5. Age-based fallback when both student & school_type are missing: school-age children (5-17) → K12, everyone else → NOT_STUDENT
  6. Active student with missing school_type → COLLEGE (adult default)

The student field is only collected for persons age 16+ in the survey instrument; younger children have student = 995 (MISSING) but typically have a valid school_type, so school_type is checked before discarding missing students.

EducationControl

Education attainment (canonical Education enum).

RaceControl

Race (canonical Race enum).

EthnicityControl

Hispanic/Latino ethnicity (canonical Ethnicity enum).

AgeControl

Age (canonical AgeCategory breakpoints).

PersonTotalControl

Structural control: total persons (incidence = 1 per person).

When aggregated to the seed table (one row per household), the incidence column becomes the count of persons in the household — effectively the non-top-coded household size.