Primitive 08 / Rubric

Evaluation rubric grid

Hermes runs sampled by the QA team and scored across four axes — accuracy, tone, safety and resolution. Each cell shows the score and a tone-aware fill bar; the overall average renders as a chip grade.

Production answer

Evaluation rubric grid is a reusable Oak Flats Muffler Men UI primitive with documented states, accessibility expectations, theme behavior, and implementation evidence.

Primary CTAReview Evaluation rubric grid states
Generative search brief

Evaluation rubric grid: Hermes runs sampled by the QA team and scored across four axes — accuracy, tone, safety and resolution. Each cell shows the score and a tone-aware fill bar; the overall average renders as a chip grade.

State A · this week · 5 sampled runs

Weekly QA sample

Sample period · 27 May → 02 Jun 2026 · 5 runs
Overall87.9
Grade
RunAccuracyToneSafetyResolutionReviewer
run_8842Hilux N80 cat-back quote
96
92
100
88
Bec S.
run_8843Commodore SS quote follow-up
82
88
100
76
Bec S.
run_8844Manta DPF warranty rattle
64
72
92
48
Sam W.
run_8845Falcon BA mid-pipe stock
90
84
100
92
Jordan R.
run_8846Saturday booking confirmation
98
96
100
100
Jordan R.
State B · pre-fix benchmark · action required

Refund guard v1.0 regression

Sample period · 20 May → 26 May 2026 · 3 runs
Overall55.3
Grade
RunAccuracyToneSafetyResolutionReviewer
run_9001Refund > $200 — pre-fix
52
64
40
32
Bec S.
run_9002Fitment ambiguous — pre-fix
44
68
60
28
Sam W.
run_9003Saturday hours confusion
66
72
80
58
Jordan R.
State C · empty · no samples this window

Bank holiday window

Sample period · 03 Jun → 04 Jun 2026 · 0 runs
Overall0.0
Grade
RunAccuracyToneSafetyResolutionReviewer