The Record Behind the Answer

When I built the pipeline behind Archetype Core, the decision that took the most thought was not the model or the transformation logic. It was deciding what each record needed to remember.

That sounds small. It is not. It is the difference between a system you can defend later and one you can only hope was right.

The questions I kept returning to were three: where did this come from, what was allowed to touch it, and what happens when the trail breaks. Data, access, failure. The same three checks I run before I trust any AI output, turned inward on the system itself.

Most of that work is invisible when the system runs. Content hashing, source lineage, audit records, prompt versions, dead-letter handling. None of it makes the pipeline look more impressive from the outside. But when something breaks, that invisible layer becomes the only part that matters.

This is what I mean by provenance. Not where something came from in the abstract. The record that lets you reconstruct and defend what happened, after the answer has already gone out the door.

The first check is data

Where did this answer come from?

In a pipeline, the same output can arrive from two different sources and look identical once it has been transformed. A clean record tells you nothing about its origin unless you stored the origin alongside it. That is why each record carries a SHA-256 content hash and a per-record trail back to its source in S3. Not to keep everything. To keep enough that the path can be rebuilt.

A broken dashboard looks suspicious because the numbers stop matching. A record with no lineage looks perfect right up until someone asks where it came from and the system has no answer. The polished version is the more dangerous one, because nothing about it signals the gap.

The point of provenance is not to store everything. It is to store enough that the path can be rebuilt when someone needs to defend it.

The second check is access

What was allowed to touch it?

Every step in a workflow makes an access decision, even when no one names it that way. A record that was read by a process scoped to read-only is a very different object from one that passed through a process allowed to write to a system of record. The final output can look the same. The risk is not the same.

Provenance is where that difference gets preserved. The record should show which version of the process acted on it, what scope that process had, and whether the access made sense for the task. Versioned prompts do part of this work: they tell you not just that a model ran, but which exact instructions shaped the result. Months later, that is the difference between explaining a decision and guessing at it.

The third check is failure

What happens when the trail breaks?

A failed record is not just a bad row to drop. It is a signal, and the system has to decide whether to listen. The weak version hides the failure inside a retry loop or a log file no one reads. The record disappears, the pipeline keeps moving, and the failure quietly becomes part of something downstream.

The stronger version sends the failure somewhere it can be seen. That is what dead-letter handling is for. A failed record is preserved with enough context to review: what it was, where it came from, where it broke. Failure does not vanish. It has somewhere to go.

Why agents make this harder

A model answers. An agent acts.

That shift changes what is at stake. When AI only produces an answer, the worst case is a wrong sentence that a person can read and reject. When AI retrieves data, calls tools, and writes to systems, the worst case is a wrong action moving through a workflow with limited human review.

At that point provenance stops being documentation. It becomes part of the control layer. The system needs to know not only what happened, but what was authorized to make it happen, and it needs to be able to show that record on demand. You can write all the governance you want, but if you cannot prove what happened, the policy is mostly decoration.

NIST and NCCoE are now moving toward the same questions. CAISI launched the AI Agent Standards Initiative in February, and NCCoE has a concept paper on agent identity and authorization already past its comment window and under review. That field is converging on agent identity, authorization, and auditability. It did not surprise me. The same questions showed up in my own pipeline before they showed up in a standards discussion.

What I keep coming back to

A system does not become trustworthy because it produced the right answer once.

It becomes trustworthy when the answer can be traced, challenged, and reconstructed. When the data has a source, the access has a scope, and the failure has somewhere to go.

Provenance is not the paperwork you add after the system works. It is part of the design that decides whether the system can still be trusted on the day it stops working.