Standards & Benchmarks

Every measurement Amal produces is compared against a set of reference levels to produce a growth status. These reference levels are called benchmarks. They are country-specific, skill-specific, and grade-specific, and for connected-text oral reading fluency they are also window-specific, meaning the target changes across the beginning, middle, and end of the year. Together they determine whether a child’s score places them in the “meets,” “approaching,” “below,” “severe,” or “not assessed” category for each skill.

This page explains how the comparison works, what the five statuses mean, and how the platform handles uncertainty, provisional values, and benchmark updates over time.

The role of benchmarks

After a child completes an assessment and the platform produces an ability estimate for a skill, the measurement is not complete. A number by itself does not tell a teacher what to do next. What matters is whether that number is where it should be for a child of this grade, in this country, at this point in the year.

Benchmarks answer that question. They are the reference levels that convert an ability estimate into a growth status that a teacher can interpret: this child is meeting expectations, approaching them, below them, or at a level that suggests urgent attention.

The benchmark values are supplied by the project’s educational specialist partners. They are not invented by the platform; they are expert judgments about what reading achievement levels indicate for children in Arabic-speaking school systems.

The five growth statuses

Status	What it means
Meets	The child’s score is at or above the target level for this skill, grade, and country
Approaching	The child’s score is above the lower threshold but has not yet reached the target level
Below	The child’s score is below the lower threshold
Severe	The child scored at zero, or was unable to read the first word (a specific rule configured per skill)
Not assessed	The score is missing, or no numeric benchmark has been configured for this skill and country yet

A note on “Severe”

The severe status applies when a child’s score hits a specific floor condition, typically a zero score or an inability to produce the first word of a reading passage. Not every skill has a severe threshold configured; this depends on what the partner has specified for each skill and country. When no severe rule is configured for a skill, the lowest reachable status is “Below."

"Not assessed” is never zero

When a skill has not been measured in a session, or when the partner has not yet supplied numeric benchmarks for that skill-country-grade combination, the platform reports “Not assessed.” This status is excluded from any aggregation, rollup, or summary count. It is never treated as a low score, a zero, or evidence of difficulty.

A child who has not yet been assessed on a skill is simply not yet assessed. The platform does not fill in an assumption about their level.

Reading fluency benchmarks change across the year

A child’s reading naturally develops over a school year. A fluency level that is right on target at the start of the year would be behind expectations by the end of it. For this reason, connected-text oral reading fluency uses a different target at each point in the year: one for the beginning of the year, one for the middle, and one for the end. The same child in the same grade is therefore compared against three different fluency targets over the year, depending on when they read.

This windowed approach applies specifically to connected-text oral reading fluency. Most other skills use a single grade-level target. The windowed fluency targets let the platform judge a child’s reading against where children are expected to be at that exact moment in the year, not against a single fixed line.

How the benchmark is resolved

For any given measurement, the platform needs to find the right benchmark for that specific combination of country, skill, assessment type, and grade band, and for fluency, the assessment window as well. It walks through these steps:

Exact match: The platform first looks for a benchmark configured for this exact country, skill, assessment type, and grade band. For connected-text oral reading fluency, this match also considers the assessment window (beginning, middle, or end of year), so the right target for the time of year is used.
Window fallback: If a fluency measurement has no benchmark configured for its specific window, the platform falls back to a window-agnostic target for that skill, and if needed to an end-of-year-representative target, so a fluency reading can always be interpreted even when a window-specific value has not yet been supplied.
Country default: If no exact match is found, the platform looks for a benchmark configured for the same country but not specific to this skill. Country-level defaults provide a fallback when a skill-specific benchmark has not yet been configured.
Global default: If no country-level default is found, the platform uses a shared global default that applies across all countries. This ensures that a benchmark is always available, so a measurement can always be interpreted rather than left unclassified.

Whenever the resolution falls back to a more general step, this is recorded in the audit trail. The benchmark record used for the session is permanently stored alongside the score.

Real benchmarks for Jordan and Palestine

Partner-supplied benchmark values for Jordan and Palestine are configured in the platform at setup. These values come directly from the partner workbook. They are never hand-edited in the platform’s documentation or code; if the partner updates the workbook, new values are extracted from it and a new benchmark configuration is created through the administrative interface.

Global defaults, which apply when no country-specific value is available for a given skill, are clearly marked as provisional until the partner supplies final values.

Benchmark versions are pinned per session

Benchmarks can be updated over time as partners refine their standards or as new data becomes available. The platform handles this carefully to ensure that updating benchmarks does not retroactively change the interpretation of assessments that have already been completed.

At the start of every assessment session, the platform records which benchmark version is active for each skill relevant to the session. All scoring and status derivation within that session uses those pinned versions. If an administrator updates a benchmark profile in the middle of the school year, those changes apply to new sessions going forward but do not touch any session that has already started or completed.

Every scored result permanently records the benchmark profile version it was compared against. This means old results can always be examined against the same standard that was in effect when they were produced.

No overall number

The platform deliberately does not produce a single overall number or percentage representing a child’s Arabic literacy. Growth statuses are always per-skill or per-area.

When teachers view a child’s results, they see a structured map of statuses across the seven reading areas and their constituent skills. When school leaders view school-level data, they see distributions of statuses across skills and classrooms. These views show where different children and groups stand on specific aspects of reading, which is what instructional decisions require.

School-leader and ministry-facing views can also present curriculum-aligned indicators: a per-grade set of reading-skill labels drawn from the national curriculum, used to show how the school’s reading work lines up with curriculum expectations. There is a defined set of these indicators for each grade. They are alignment and reporting labels only: they describe coverage against the curriculum and never influence any child’s status, profile, support plan, or any decision the platform makes. They exist to help leaders and the ministry see the curriculum picture, not to grade children.

A single combined number would hide the pattern differences that matter most for deciding what kind of support to offer. Keeping the statuses disaggregated is not a limitation of the platform; it is a deliberate measurement principle.

How benchmark updates work in practice

When the partner supplies updated cut scores, a school administrator or platform administrator creates a new benchmark profile through the administrative interface. The new profile can be activated immediately or held in draft until a planned date. Activation is atomic: in a single operation, the old active profile is deactivated and the new one becomes active. No partial states are possible.

The complete history of benchmark profile activations, deactivations, and changes is recorded in a permanent log with the administrator’s identity and the timestamp of each action.

No code changes are required to update benchmarks. The partner-supplied values live in the platform’s configuration, not in the software itself.

What teachers see

Teachers never see the internal identifiers the platform uses to label benchmark sources. The labels that identify where benchmark values came from are stripped from all teacher and school-leader responses before they leave the server. Teachers see growth status labels in Arabic only: Meets, Approaching, Below, Severe, Not Assessed. The behind-the-scenes benchmark reference is an administrative matter, not a teacher-facing one.