BasicAgent
LLM Audit Trail: Provenance, Replayability, Evidence Bundles
Auditable LLM pipelines—provenance IDs, replay controls, evidence bundles, and Layered-CoT verdicts tied to spans for defensible outputs.
For any answer you ship, you must prove: where it came from, how to replay it, what evidence supports it, and how it stays correct over time. This page shows the minimum audit schema and how to wire it to sandbox/promote and Layered-CoT from Multi-Agent-COT-Prompting.
What to log (stage-level, not just prompts)
run_id(pipeline),trace_id(end-to-end),span_id+parent_span_id(graph)stage(semantic step),agent_role,status- timestamps, model/provider, token counts, retries
- inputs/outputs (URIs), retrieval set IDs, validation verdicts
- hashes/signatures for evidence bundles
Code: one span record
span = {
"run_id": run_id,
"trace_id": trace_id,
"span_id": "validate-rag-1",
"parent_span_id": "answer-1",
"stage": "layered_cot_validation",
"agent_role": "governance_auditor",
"model": "gpt-4o-mini",
"status": "pass",
"latency_ms": 842,
"retries": 1,
"retrieval_ids": ["doc:boj:2024-11"],
"inputs_uri": "s3://evidence/runs/.../inputs.json",
"outputs_uri": "s3://evidence/runs/.../answer.json",
"verdict": {"ok": True, "reason": "facts match sources"},
}
write_audit(span)
Reference flow
- Ingest and normalize (text/tables/images).
- Retrieve with provenance IDs.
- Sandbox generation (explore) → Layered-CoT validation → promote.
- Export evidence bundle: sources, prompts, outputs, hashes, and validation verdicts.
- Keep replay knobs: model, temperature, prompt version, retrieval snapshot.
Why this matters
- “It answered, but we can’t explain it” → check retrieval_ids + spans.
- “Regression this week” → diff spans, prompts, and model versions.
- “Show reviewers sources” → evidence bundle linked to span outputs.
Download the starter schema (JSON): /tools/llm-audit-log-schema/
Related pages
- Observability:
/ai-observability/ - Governance:
/ai-governance/ - RAG provenance:
/rag-provenance-citations/