BasicAgent

LLM Workflow Reliability: Concurrency, Timeouts, Retries, Traceability

Reliability playbook for LLM and agent pipelines—bounded concurrency, hard timeouts, jittered retries, sandbox-to-promote gating, and traceable spans grounded in Multi-Agent-COT-Prompting.

Long-running agent pipelines (30–180 minutes) need hard controls, not hope. Reliability is a systems problem: concurrency caps, deterministic timeouts, retries with jitter, streaming-safe parsing, and traceable runs with run IDs and Layered-CoT checkpoints.

Reliability checklist

Bounded concurrency: cap in-flight calls to avoid 429 storms.
Hard timeouts: connect, read, and total deadlines.
Retry + jitter: only for transient errors (429/5xx/network).
Streaming-safe: parse SSE (data: {...} and data: [DONE]).
Sandbox → promote: run experiments in sandbox, promote only validated outputs.
Audit + traces: keep run_id + trace_id per span.

Code: reliable client call with gates

import asyncio, os
from nohang_client import NoHangClient

async def run_once(prompt: str):
    async with NoHangClient(
        base_url="https://api.openai.com/v1",
        api_key=os.environ["OPENAI_API_KEY"],
        default_model="gpt-4o-mini",
        max_concurrent=12,
        timeout_total=180,
        max_retries=3,
    ) as client:
        text = await client.chat(
            [{"role":"user","content":prompt}],
            stream=True,
            max_tokens=256,
        )
        return text

# Sandbox → promote guardrail
async def guarded(prompt):
    draft = await run_once(prompt)
    verdict = validate(draft)  # your Layered-CoT or eval here
    return draft if verdict.ok else fallback_response()

asyncio.run(guarded("Summarize today’s incidents in 3 bullets."))

How it maps to the COT repo

Layered-CoT: use layered validation on sandbox outputs before promotion.
Orchestrator: route intents to the right agent and attach risk classes (see simple-orchestrator/orchestrator.py).
Sandbox-promote pattern: separate creative generation from validated delivery.

Observability hooks

Emit per-call metrics: latency_ms, retries, timeout_count.
Trace spans: role, prompt template version, model, token counts.
Log verdicts from validation (pass/fail + reason). See /ai-observability/ for schema ideas.