Safety check block
Define safety rules with explicit actions on hit — block, redact, flag and continue, or escalate to a human. Inspect counts and hit rates make it easy to spot a rule that's gone trigger-happy.
Safety check block is a reusable Oak Flats Muffler Men UI primitive with documented states, accessibility expectations, theme behavior, and implementation evidence.
Safety check block: Define safety rules with explicit actions on hit — block, redact, flag and continue, or escalate to a human. Inspect counts and hit rates make it easy to spot a rule that's gone trigger-happy.
Quote estimator
PII redact Strip phone, email and ABN before logging the payload.
Inspected1,846Hits32Hit rate1.7%On hitRedactEnabledJailbreak Block prompt-injection attempts disguised as customer messages.
Inspected1,846Hits4Hit rate0.2%On hitBlock · haltEnabledModeration OpenAI moderation pass on outbound copy.
Inspected1,742Hits8Hit rate0.5%On hitFlag · continueEnabledTopic fence Refuse engine ECU tuning advice · escalate to Sam W (workshop lead).
Inspected1,742Hits2Hit rate0.1%On hitEscalate to HermesEnabledPII redact Mask rego plates in transcripts archived to S3.
Inspected1,742Hits18Hit rate1.0%On hitRedactOff
SMS triage
Jailbreak Block SMS prompt-injection ('ignore your instructions').
Inspected1,846Hits12Hit rate0.7%On hitBlock · haltEnabledPII redact Mask phone numbers before forwarding to model context.
Inspected1,846Hits1846Hit rate100%On hitRedactEnabledTopic fence Refuse ADR certification + legal-engineering advice.
Inspected1,846Hits6Hit rate0.3%On hitEscalate to HermesEnabled
Outbound moderation only
Moderation OpenAI moderation pass on any outbound message.
Inspected412Hits2Hit rate0.5%On hitFlag · continueEnabled