When a deployer of a high-risk AI system installs Hard Witness, they lose the ability to falsify the audit trail of that system.
An evaluator — at UK AISI, the EU AI Office, NIST AISI, or any notified body — pulls live state with a signed request, receives a cryptographically signed response, and verifies it offline against a public witness set.
Today, an inference deployment's audit trail is operator-attested. Hard Witness makes it verifier-pullable.
Three useful answer shapes — yes, no because [X], or what it would need to be — all redirect the next quarter usefully. Silence is the only outcome we want to avoid.
evaluator (UK-AISI / NIST-AISI / EU AI Office)
|
| verifier single static binary, runs on evaluator laptop
| reads exactly two files (bundle + witness set)
| zero network calls
v
+-------------------------------------------------------+
| Evidence bundle |
| signed events (Ed25519, per-event chain link) |
| M-of-N witness anchor |
| prev_anchor_chain_head_hash (cross-bundle linkage)|
| operator + agent directory snapshots |
+-------------------------------------------------------+
^ ^
| anchor sigs | per-event signing
| |
+--------+---------+ +--------+----------------+
| witness daemons | | agent (Linux eBPF) |
| independent IDs | | WAL fsync per event |
| per-jurisdiction | | bounded in-mem chain |
+------------------+ +-----------+-------------+
| outbound-only
v
+-----------------------+
| home control plane |
+-----------------------+The agent never accepts inbound. Witnesses are independent processes whose identity keys are held by separate parties — in production, each AISI or regulator runs its own. The evaluator's side is one binary, no daemon, no network — two files in, one verdict out.
This is not a slide deck. Every row below corresponds to a commit on main. Independently verifiable in under an hour by anyone with a Linux box and the listed toolchain.
| Layer | What ships today | Status |
|---|---|---|
| Verifier | Rust static binary (528 KB), TypeScript reference, byte-identical across 29 conformance cases | production-shape |
| Protocol | Tamarin Prover — 9 lemmas: cross-bundle continuity, observation-agent authenticity, M-of-N threshold, replay resistance | mechanically checked |
| Verifier impl. | Verus — 3 structural invariants discharged (chain link, witness threshold, anchor head) | mechanically checked |
| Ed25519 primitive | libcrux-ed25519 (HACL* / F*) — same code in Mozilla Firefox NSS | formally verified |
| BLAKE3 primitive | Audited reference impl; verification status documented honestly | known gap |
| Live agent | Linux eBPF — kprobes + tracepoints + WAL with per-event fsync + bounded in-memory chain | Hetzner soak-validated |
| Witness daemons | Independent processes, Ed25519 identity per jurisdiction, M-of-N anchor signing over QUIC mTLS | code-complete |
| Transport | QUIC + mTLS, outbound-only, ack-driven WAL compaction, backpressure under home unavailability | deployed |
| Cross-bundle continuity | prev_anchor_chain_head_hash; 3-bundle end-to-end test verifies | shipped |
The Tamarin proofs caught two real protocol bugs in development. Both fixes shipped into the spec and the implementation before the proofs discharged.
The audit trail under EU AI Act Article 12 (traceability), Article 61 (post-market monitoring), and Article 79 (serious incident reporting) must be trustworthy enough that a notified body can act on it.
Today that trustworthiness is operator-attested. Existing tooling — dashboards, manual audits, vendor self-attestation — either trusts the dashboard provider or doesn't produce a cryptographic record.
The deployer cannot retroactively edit. The vendor cannot retroactively re-sign. Even if the deployer compromises its own agent, the witnesses are independent — the audit story doesn't collapse to the deployer's honesty.
CT didn't replace certificate issuance; it removed the need to trust certificate issuers' word. Hard Witness applies the same shape to inference audit.
The wedge isn't dashboards (Credo AI, Holistic AI, Fiddler, Arize). It's the primitive underneath — bytes pulled from runtime, signed at production, with cryptographic non-repudiation that composes underneath every dashboard above it.
Every primitive is something a deployment-team cryptographer can name:
The novel work is the composition and the witness-anchored evidence schema. The components are off-the-shelf.
Pass / fail at each phase. The technical lift is on us; we need your regulatory lens and the political weight to name the deployment.
Your team takes the verifier binary, the 29-case conformance corpus, and the brief. We answer questions about trust layers, Tamarin lemmas, primitive choice.
One inference deployment runs agent + transport + 3-of-5 witnesses, with at least one witness identity held by your institute. Real inference_trace queries.
A named regulated deployment — e.g. a frontier lab's EU-hosted cluster — with your institute holding the verifier and a witness identity. Legal & governance co-developed.
Either is fine. Each filters for a different kind of first conversation.
We share a temporary access token for the working tree. Your cryptographer runs:
~1 hour on one laptop
30-minute call. We bring the technical detail.
You bring the regulatory shape we should target — the regime you're stood up under and the named deployment where this primitive would be load-bearing first.