Skip to Content

Standards & Benchmarks

The standards module (WP-STD-BE, modules/standards) turns a raw measurement score into a MeasureStatus verdict by comparing it against partner-owned, country-keyed, versioned cut scores. It also manages those cut scores without requiring code changes.

There is no single overall score or percentage anywhere in the system (V-3). Every status is per-measure or per-macro-domain. Not_Assessed is never treated as zero.

MeasureStatus: the 5-value verdict

A measurement score resolves to exactly one of five statuses. These are surfaced as descriptive Arabic labels, never as numbers or percentages.

Status keyMeaning
meetsScore meets the target cut
approachingScore is above the lower threshold but below the target
belowScore is below the lower threshold
severeScore is zero, or the student was unable to read the first word (partner-defined zero-rule)
not_assessedScore is null, no numeric cuts are configured, or the partner has not yet supplied cut scores for this dimension

not_assessed is excluded from any rollup. It is never averaged, never treated as a low score.

Three-step resolution chain

For a given (country, skillId, assessmentType, gradeBand) the engine walks three levels:

  1. Exact match: a profile for this exact country + skill + assessment type + grade band.
  2. Country default: if no exact match, a profile for the same country with skillId null.
  3. Global fallback: if still no match, a profile with country null (the __GLOBAL__ sentinel). This row is always present, so resolution never returns null.

Fallback steps 2 and 3 write a resolution_fallback audit row. If all three steps fail (not possible with the shipped seed but defensible in future) a resolution_miss is logged and the status is not_assessed.

R12 (Q8): the assessmentWindow dimension for windowed ORF

The diacritized-connected-MSA ORF benchmarks (JO ORF_CBM, PS ORF_WITH_DIACRITICS) are now windowed across BOY / MOY / EOY (commit 2ea532bd). assessmentWindow (AssessmentWindow {BOY, MOY, EOY}) is an optional fourth resolution dimension — it is null (window-agnostic) for every non-ORF row and for __GLOBAL__. Both the active-uniqueness key and the version key add the window, so several windowed cuts for the same grade can be active at once.

Resolution uses an EOY-representative window-fallback (benchmark-reader.ts findWithWindowFallback):

  • A caller passing a window prefers the exact-window row, then falls back to the window=null row.
  • A caller passing no window prefers the window=null row, then falls back to the representative windowed cut, EOY-first (deterministic EOY > MOY > BOY).

This keeps every window-agnostic ORF/CBM consumer working unchanged (e.g. a JOR G2 ORF probe with no window resolves to the G2 EOY cut, 40 WCPM). The HTTP …/resolve route does not yet accept a window param (a Tier B follow-up); only internal session-pinning callers can pass a window today.

ORF-applicability sentinels (Q8)

ORF benchmarks carry an orfApplicability value (required | not_applicable | optional_baseline_no_cut). Where ORF is not yet applicable, resolution returns a typed sentinel — never a BenchmarkUnresolvable error and never a meets / below verdict:

  • G1 (all windows) and G2 BOYnot_applicable (typed sentinel).
  • G2 MOYoptional_baseline_no_cut: the probe value is recorded but no verdict is emitted.

Session pinning (V-6)

When a session begins, freezeBenchmarkResolution writes the resolved { profileId, version } map once for that (org, session) pair — including the resolved window for windowed ORF cuts. All score evaluations within the session use that pinned version. A mid-session admin benchmark swap does not affect in-flight sessions. Each probe row permanently stamps benchmarkProfileId and benchmarkProfileVersion for replay.

Versioning and atomic swap

Benchmark profiles are versioned. The admin activate endpoint performs an atomic swap under a Postgres advisory lock: in a single transaction it deactivates the current active profile, activates the new one, and appends a changelog row. A partial-active uniqueness constraint at the database level provides a second safety belt.

The change-log records every create, activate, deactivate, resolution_fallback, and resolution_miss event, with the admin user ID and version transition.

Seeded cut scores

Real partner cut scores for Jordan (JO) and Palestine (PS) are seeded at setup time. These numbers come from the partner workbook and are never hand-edited in documentation or code. If the partner updates the workbook, numbers are re-extracted from it and a new benchmark profile is created and activated through the admin API. No code change is needed.

__GLOBAL__ rows carry working-default values (valuesPending: true) until the partner supplies final numbers for that dimension.

No framework or clinical labels on any surface

The internal computation layer uses source identifiers (such as JOR_RAMP, plus a Palestine equivalent) drawn from the partner workbook. A serialization-layer guard (strip-yardstick.ts) strips these identifiers before any teacher or admin response leaves the server. The resolve endpoint omits the source field entirely. These labels never appear in any user-facing surface.

Principal school overview

The principal dashboard reads from the standards module for three surfaces:

  • School Literacy Health overview: Curriculum-KPI gauges displayed with Arabic labels and status categories. not_assessed is excluded from the school health index; it is shown separately, never folded into the status distribution.
  • Class x KPI heatmap: for each class and each Curriculum-KPI tile, the worst (dominant) status among students in that class. One cell per (class, KPI) pair; no raw counts or percentages.
  • Longitudinal growth view: per macro-domain tile, status counts for each assessment window (BOY/MOY/EOY) that students have graded snapshots in, plus window-over-window movement counts (toward Meets / away from Meets / held) for students graded in both windows. Aggregate-only, count-based — never a percentage.

None of these surfaces display a number or percentage representing overall literacy. See Managers & Principals for how principals and managers read these views.

R12 (Q9): the per-grade Curriculum-KPI dictionary (alignment-only)

R12 added a separate layer from the 15 domain/macro gauge rows above (commit 5df45c3b): a per-grade Curriculum-KPI dictionary of 51 indicators (G1:12, G2:13, G3:13, G4:13) across 3 new GLOBAL tables — curriculum_kpi_dictionary, curriculum_kpi_to_skill_map, and curriculum_kpi_country_map.

Alignment / reporting only — never a decision input. Every dictionary row carries decisionWeight = 0 (a DB CHECK constraint) and runtimeRole = ALIGNMENT_ONLY as stored columns. These KPIs NEVER influence skill_status / domain_status / profile / bundle / rti_tier / severity / benchmark. The lint:kpi-alignment-only import-graph guard fails CI if any decision-engine module (modules/{sse,profile,bundle,cluster,monitoring,rti,standards/engine}) reads the 3 KPI tables — only reporting / principal-dashboard / API modules may read them.

API reference

MethodPathPermissionNotes
GET/api/standards/benchmark-profiles/resolveTeacher / adminResolve the active profile for a given context
GET/api/admin/benchmark-profilesSUPER_ADMINList all profiles with version and activation state
POST/api/admin/benchmark-profilesSUPER_ADMINCreate a new profile (optionally activate immediately)
POST/api/admin/benchmark-profiles/:id/activateSUPER_ADMINAtomic swap-activate
GET/api/admin/benchmark-profile-changelogSUPER_ADMINFull audit log of all profile events
GET/api/principal/schools/:id/overviewVIEW_SCHOOLSchool health gauges
GET/api/principal/schools/:id/heatmapVIEW_SCHOOLClass x KPI dominant-status heatmap
GET/api/principal/schools/:id/growthVIEW_SCHOOLLongitudinal window-over-window growth by macro-domain tile
POST/api/principal/schools/:id/ministry-reportVIEW_SCHOOLWave 1 stub (returns 501; Wave 2 feature)

No endpoint grants a student permission. Benchmark data is never served directly to students or parents.