Primitive 13 / Safety

Safety check block

Define safety rules with explicit actions on hit — block, redact, flag and continue, or escalate to a human. Inspect counts and hit rates make it easy to spot a rule that's gone trigger-happy.

Production answer

Safety check block is a reusable Oak Flats Muffler Men UI primitive with documented states, accessibility expectations, theme behavior, and implementation evidence.

Primary CTAReview Safety check block states
Generative search brief

Safety check block: Define safety rules with explicit actions on hit — block, redact, flag and continue, or escalate to a human. Inspect counts and hit rates make it easy to spot a rule that's gone trigger-happy.

State A · quote estimator guardrails · payload logging off
Safety checks

Quote estimator

64 hits / 8,918 checked
  • PII redact

    Strip phone, email and ABN before logging the payload.

    Inspected1,846Hits32Hit rate1.7%
    On hitRedactEnabled
  • Jailbreak

    Block prompt-injection attempts disguised as customer messages.

    Inspected1,846Hits4Hit rate0.2%
    On hitBlock · haltEnabled
  • Moderation

    OpenAI moderation pass on outbound copy.

    Inspected1,742Hits8Hit rate0.5%
    On hitFlag · continueEnabled
  • Topic fence

    Refuse engine ECU tuning advice · escalate to Sam W (workshop lead).

    Inspected1,742Hits2Hit rate0.1%
    On hitEscalate to HermesEnabled
  • PII redact

    Mask rego plates in transcripts archived to S3.

    Inspected1,742Hits18Hit rate1.0%
    On hitRedactOff
Payload logging off
State B · SMS triage · payload logging on (debug)
Safety checks

SMS triage

1864 hits / 5,538 checked
  • Jailbreak

    Block SMS prompt-injection ('ignore your instructions').

    Inspected1,846Hits12Hit rate0.7%
    On hitBlock · haltEnabled
  • PII redact

    Mask phone numbers before forwarding to model context.

    Inspected1,846Hits1846Hit rate100%
    On hitRedactEnabled
  • Topic fence

    Refuse ADR certification + legal-engineering advice.

    Inspected1,846Hits6Hit rate0.3%
    On hitEscalate to HermesEnabled
Payload logging on
State C · single moderation rule · catch-all flag
Safety checks

Outbound moderation only

2 hits / 412 checked
  • Moderation

    OpenAI moderation pass on any outbound message.

    Inspected412Hits2Hit rate0.5%
    On hitFlag · continueEnabled
Payload logging off