Hard Witness — verifiable AI inference audit

The proposition

When a deployer of a high-risk AI system installs Hard Witness, they lose the ability to falsify the audit trail of that system.

An evaluator — at UK AISI, the EU AI Office, NIST AISI, or any notified body — pulls live state with a signed request, receives a cryptographically signed response, and verifies it offline against a public witness set.

The verification process requires no trust in the deployer, no trust in the deployer's logging infrastructure, and no trust in the vendor.

Today, an inference deployment's audit trail is operator-attested. Hard Witness makes it verifier-pullable.

The question we're asking

If this primitive existed in a deployable state today, would having it in your evidence stack change how you'd handle the next high-risk AI deployment under your remit?

Three useful answer shapes — yes, no because [X], or what it would need to be — all redirect the next quarter usefully. Silence is the only outcome we want to avoid.

How it works

   evaluator (UK-AISI / NIST-AISI / EU AI Office)
        |
        | verifier   single static binary, runs on evaluator laptop
        |              reads exactly two files (bundle + witness set)
        |              zero network calls
        v
   +-------------------------------------------------------+
   |  Evidence bundle                                      |
   |    signed events  (Ed25519, per-event chain link)     |
   |    M-of-N witness anchor                              |
   |    prev_anchor_chain_head_hash  (cross-bundle linkage)|
   |    operator + agent directory snapshots               |
   +-------------------------------------------------------+
            ^                          ^
            | anchor sigs              | per-event signing
            |                          |
   +--------+---------+       +--------+----------------+
   | witness daemons  |       | agent (Linux eBPF)      |
   | independent IDs  |       |   WAL fsync per event   |
   | per-jurisdiction |       |   bounded in-mem chain  |
   +------------------+       +-----------+-------------+
                                          | outbound-only
                                          v
                              +-----------------------+
                              | home control plane    |
                              +-----------------------+

The agent never accepts inbound. Witnesses are independent processes whose identity keys are held by separate parties — in production, each AISI or regulator runs its own. The evaluator's side is one binary, no daemon, no network — two files in, one verdict out.

What's actually built

This is not a slide deck. Every row below corresponds to a commit on main. Independently verifiable in under an hour by anyone with a Linux box and the listed toolchain.

Layer	What ships today	Status
Verifier	Rust static binary (528 KB), TypeScript reference, byte-identical across 29 conformance cases	production-shape
Protocol	Tamarin Prover — 9 lemmas: cross-bundle continuity, observation-agent authenticity, M-of-N threshold, replay resistance	mechanically checked
Verifier impl.	Verus — 3 structural invariants discharged (chain link, witness threshold, anchor head)	mechanically checked
Ed25519 primitive	libcrux-ed25519 (HACL* / F*) — same code in Mozilla Firefox NSS	formally verified
BLAKE3 primitive	Audited reference impl; verification status documented honestly	known gap
Live agent	Linux eBPF — kprobes + tracepoints + WAL with per-event fsync + bounded in-memory chain	Hetzner soak-validated
Witness daemons	Independent processes, Ed25519 identity per jurisdiction, M-of-N anchor signing over QUIC mTLS	code-complete
Transport	QUIC + mTLS, outbound-only, ack-driven WAL compaction, backpressure under home unavailability	deployed
Cross-bundle continuity	`prev_anchor_chain_head_hash`; 3-bundle end-to-end test verifies	shipped

The Tamarin proofs caught two real protocol bugs in development. Both fixes shipped into the spec and the implementation before the proofs discharged.

Why this matters for AI governance

The audit trail under EU AI Act Article 12 (traceability), Article 61 (post-market monitoring), and Article 79 (serious incident reporting) must be trustworthy enough that a notified body can act on it.

Today that trustworthiness is operator-attested. Existing tooling — dashboards, manual audits, vendor self-attestation — either trusts the dashboard provider or doesn't produce a cryptographic record.

Hard Witness replaces "the deployer says these are the logs" with "the runtime signed these events, M-of-N independent witnesses signed the anchor, and the evaluator's binary verifies it offline."

The deployer cannot retroactively edit. The vendor cannot retroactively re-sign. Even if the deployer compromises its own agent, the witnesses are independent — the audit story doesn't collapse to the deployer's honesty.

What's distinctive

Analogy

Certificate Transparency, applied to inference

CT didn't replace certificate issuance; it removed the need to trust certificate issuers' word. Hard Witness applies the same shape to inference audit.

The wedge isn't dashboards (Credo AI, Holistic AI, Fiddler, Arize). It's the primitive underneath — bytes pulled from runtime, signed at production, with cryptographic non-repudiation that composes underneath every dashboard above it.

Cryptography

No new cryptography

Every primitive is something a deployment-team cryptographer can name:

Ed25519 signatures
BLAKE3 hashing
Canonical JSON (RFC 8785)
X.509 + QUIC

The novel work is the composition and the witness-anchored evidence schema. The components are off-the-shelf.

A named-deployment trial — the shape we propose

Pass / fail at each phase. The technical lift is on us; we need your regulatory lens and the political weight to name the deployment.

Phase 1 · 1–2 weeks

Evaluator dry-run

Your team takes the verifier binary, the 29-case conformance corpus, and the brief. We answer questions about trust layers, Tamarin lemmas, primitive choice.

~2 evaluator-days

Phase 2 · 4–6 weeks

Co-deployed trial

One inference deployment runs agent + transport + 3-of-5 witnesses, with at least one witness identity held by your institute. Real inference_trace queries.

~1 engineer-week

Phase 3 · 8–12 weeks

Sign-off + production

A named regulated deployment — e.g. a frontier lab's EU-hosted cluster — with your institute holding the verifier and a witness identity. Legal & governance co-developed.

operational + legal

Engagement — two paths in

Either is fine. Each filters for a different kind of first conversation.

Technical first

"Send me the repo URL"

We share a temporary access token for the working tree. Your cryptographer runs:

the Tamarin proofs
the Verus invariants
the 1000-iteration differential corpus
the 29 conformance cases

~1 hour on one laptop

Email for repo access

Leadership first

"We should talk"

30-minute call. We bring the technical detail.

You bring the regulatory shape we should target — the regime you're stood up under and the named deployment where this primitive would be load-bearing first.

Book a 30-min call