Office

How to Keep Codex Background Jobs Running Without Stale Terminals (bgmon)

Solve terminal drop and stale-session failures in Codex workflows using bgmon detached execution, durable logs, and explicit RUNNING/DEAD/STALE state checks.

Published: 2026-02-10 · Last updated: 2026-02-10

Estimated read time: 5 min

The exact failure mode in long Codex sessions is not model quality, it is terminal fragility. If a PTY disconnects, you need job state that survives the shell, not shell history. bgmon solves this by persisting job metadata in ~/.bgmon/jobs/*.json and streaming stdout/stderr to durable logs in ~/.bgmon/logs/*.log.

In our live MolLEO pipeline, this pattern keeps the control chain deterministic. At one point the optimizer and follow-on stage were both active and inspectable from state, not from a still-open terminal:

NAME                                      STATUS   PID
c180_tuned_01_molleo_window_stage_01      RUNNING  530760
c180_tuned_01_window_to_deep04_post_01    RUNNING  567133

Launch is intentionally detached and session-safe: bgmon uses a new process session, no controlling stdin, and explicit log routing so work continues across reconnects.

proc = subprocess.Popen(
    cmd,
    stdin=subprocess.DEVNULL,
    stdout=log_f,
    stderr=subprocess.STDOUT,
    cwd=args.cwd,
    start_new_session=True,
    close_fds=True,
)

It also defends against PID reuse. On status and stop operations, bgmon re-checks /proc/<pid>/stat starttime jiffies and marks the job STALE if the PID was recycled by another process, which prevents accidental signaling of unrelated work.

if cur != expected_starttime:
    return (False, True)  # stale pid reuse

With this control loop, workflows become queueable and composable: start stage A, watch deterministic status, tail logs, then trigger stage B from a post-hook. That is why bgmon is a must-have Codex operator tool for multi-hour and multi-day pipelines.

Scope boundary

This note covers: detached job control and workflow chaining for Codex and CLI pipelines.

This note excludes: strategy logic, crawler internals, and any execution policy beyond job lifecycle control.