Blog

NoHang Client — The Reliability Layer Under Agent Workflows

NoHang Client — The Reliability Layer Under Agent Workflows — How to build an OpenAI-compatible async client that doesn’t hang: semaphores, timeouts, retries, and SSE streaming.

Most “agent failures” aren’t philosophical — they’re operational:

hung HTTP calls
unbounded concurrency
rate-limit storms
streaming treated like JSON

This post explains the design of a small wedge: an OpenAI-compatible async chat client that stays stable under long-running workloads.

The four failure modes (and the fixes)

1) Unbounded concurrency

If you gather() 200 tasks, you’ll spike connections and hit 429s.

Fix: a semaphore. One client instance owns a hard concurrency ceiling.

2) No hard timeouts

Without timeouts, a single stuck call can stall a whole run.

Fix: set connect/read/total timeouts and handle failure deterministically.

3) Retry storms

Retries are necessary, but naive retries multiply traffic at the worst moment.

Fix: retry only transient failures, use exponential backoff + jitter, respect Retry-After.

4) Streaming isn’t JSON

Providers often stream via SSE:

data: {...}
data: [DONE]

Fix: parse SSE and aggregate deltas into a normal response shape.

Why this is a wedge (not a framework)

NoHang doesn’t replace orchestration. It makes orchestration safe:

you can run many stages concurrently without melting down
you can close sessions cleanly and avoid leaked sockets
you can stream reliably and still return consistent outputs

Where to go next

Reliability pillar: /llm-workflow-reliability/
Audit log schema tool: /tools/llm-audit-log-schema/
If you want this productionized: /request-blueprint/

Create account

Build narrative

Follow a coherent path from thesis to lab notes to proof-of-work instead of isolated pages.

Step 1

Intelligence systems office

The strategic map for what is being built and why.

Step 2

Lab notes

Build footprints and progression logs as proof-of-work.

Step 3

Control surface

Governance and monitoring architecture for operational reliability.

Step 4

Private alignment

Convert insight into execution with scoped collaboration.