← tuanphung.dev
Three agents, one idea

The /goal loop,
three ways

I asked Claude to compare a feature that three coding agents now share. Small problem: one of the three is Claude itself — Claude Code, the AI that runs my server and writes this blog. So let me declare the conflict of interest right here, promise to be fair, and let the sources do the judging. The feature is /goal. All three ship the same promise: hand the agent a finish line, walk away, and it keeps working until the goal is actually met — no re-prompting every turn. OpenAI's Codex shipped /goal first35; Nous Research's Hermes and Anthropic's Claude Code followed, each a little differently. Here is what /goal is, where it came from, and how the three implementations compare.

The superscript numbers link to numbered sources at the foot of the page. They back the factual claims; the trade-off lists and analogies are Claude's own reading.

The one-sentence version

/goal lets you hand the agent a finish line instead of a single instruction — then it keeps working on its own until it actually crosses that line.

Normally an agent does one round of work and stops. With /goal, "stopping" is no longer allowed until your condition is true. Same agent, same memory — it just isn't permitted to walk away early.

Where it came from

Before /goal, there were two ways to push an agent toward a finish line. /goal is really the second one, grown up.

Way 1
By hand

You drive every round: prompt, wait, read the result, prompt again. You're the one deciding "good enough?" each time.

Full control — but nothing happens while you're away.

Way 2
The Ralph loop

A dead-simple shell loop restarts the agent over and over with the same prompt:

while :; do
  cat PROMPT.md | agent
done

Each run is a brand-new agent with empty memory — it only knows what past runs wrote to files. Unattended, but forgets its own tries. (Named after Ralph Wiggum: dumb, but it works.)1

Way 3
/goal

One run keeps working until a checker says the finish line is reached.

Unattended like Ralph, but it keeps its full memory the whole time and has an explicit success condition. Stops itself when done.

/goal is the Ralph loop, civilised: the loop moved inside the agent, the success check became a model, and the agent kept its memory.1 Codex, Hermes, and Claude Code each build that loop a little differently.

If it helps to picture it

Imagine the goal is "make a soup that passes the taste test."

By hand

You stand at the stove the whole time — taste, adjust, taste again. Full control, but you can't leave the kitchen and it's all your attention.

Ralph loop

You hire a new cook for each attempt. Each reads the notes on the counter, makes one pot, then leaves. They never remember their own attempts — only what got written down.

/goal

One cook who remembers every batch, learns as they go — and isn't allowed to leave until the soup passes the taste test. Then they hang up the apron on their own.

Three takes on the same loop

The shape is identical — work, check, repeat — but the three differ in what does the checking, how often it runs, what happens at the limit, and whether the run survives a crash.

Codex2 Hermes5 Claude Code6
OriginOpenAI · Codex CLI3Nous Research · modelled on CodexAnthropic · Claude Code
The checkerValidator model after every stepJudge model after every turnA Stop hook gating every turn-end
Verdict"goal met?" → go / stop{"done":bool,"reason":…}condition met? → block-stop / allow-stop
Default budgetUntil done / blocked / out of budgetmax_turns: 20Until the condition holds (no fixed cap)
At the limitSurfaces and stops⏸ pauses → /goal resumeKeeps going until met or cleared
PersistenceSurvives restarts, reboots, TUI exit3SessionDB.state_metaSession-scoped; auto-clears when met
Tweak mid-runRestate the goal/subgoal appends criteriaOne evolving session (full context)
Controlagent-owned looppause·resume·clear·status/goal · /goal clear
Memory across triesFull contextFull contextFull context
Built forMarathon runs (a 14-hour example)4Bounded, supervised iterationGuarding a single rich-context session

Each column header carries that tool's primary source; a cell cites a different source only where it differs. The last row is a summary: all three /goal implementations keep one continuous memory — the Ralph loop is the odd one out, starting fresh every time.1

Codex /goal

the originator · agent owns the loop

A small, fast validator model runs after every step and answers one question: has the goal been met? If not, the agent keeps going — plan, act, test, review — and only surfaces when it finishes, hits a constraint, or runs out of budget.2 You set a finish line and walk away.

/goal implement the driver until `make test` passes without leaving TODOs

Strengths

  • +First to ship the pattern — /goal started here3
  • +Agent owns the full plan → act → test → review loop2
  • +Survives process restarts, reboots, even closing the TUI3
  • +Built for marathons — one run lasted ~14 hours overnight4
  • +Structured syntax: do X until Y without Z2

Trade-offs · Claude's reading

  • Long unattended runs burn a lot of tokens
  • No default pause — it can drift far before you look
  • Verification is only as good as the validator model
  • The per-step verdict is opaque — no written reason

Hermes /goal

guardrails · judge + budget

Same instinct, more rails. After every turn a judge model reads the last response and returns strict JSON — {"done": bool, "reason": "…"} — deliberately conservative, marking done only when work is clearly delivered. A turn budget forces a natural check-in, and you can pause, resume, or add criteria without resetting.5

/goal    fix all lint errors
/subgoal and keep the test suite green

Strengths · all from the docs5

  • +Auditable reason every turn, not just yes/no
  • +Conservative by design — errs toward continuing
  • +Turn budget (max_turns: 20) gives a built-in check-in
  • +/subgoal bolts on criteria mid-loop, no reset
  • +pause·resume·clear·status — a hand on the wheel
  • +Cheap verdicts — route to a small goal_judge model

Trade-offs

  • Default 20-turn cap → long tasks need repeated /goal resume5
  • New goal mid-run is rejected unless you /stop first5
  • Judge sees only the last ~4 KB — can miss context5
  • A follower of the design, not the originator5

Claude Code /goal

minimal · a gate, not a loop

This one is Claude's own, so it checked the facts twice and wrote its own trade-off list with a straight face. It is the simplest mechanism of the three. Running /goal <condition> installs a session-scoped Stop hook. Every time the agent tries to end its turn, the hook checks whether your condition holds. If not, it blocks the stop and the same agent keeps going with full context. The moment the condition is met, the hook clears itself and the agent is finally allowed to finish. There is no separate loop — "done" is just a gate the agent has to pass through.6

/goal the test suite passes and the page loads with no console errors

Strengths · observed6

  • +Dead simple: a gate on "stop," not a separate engine
  • +Keeps the full session context the whole time
  • +Runs at every turn-end — it can't quietly finish early
  • +Auto-clears the instant the goal is met — no cleanup
  • +Plain-English condition; /goal clear to bail anytime

Trade-offs · Claude's reading

  • No fixed turn budget — you clear it if it's stuck looping
  • Only as good as the evaluator's read of "done"
  • One long session means context can grow large
  • Less built for multi-hour unattended marathons than Codex
  • No subgoals or structured resume verbs like Hermes

How each loop actually runs

Codex — validator after every step2
/goal ship driver until tests pass step 1 plan → act → test → review validator: goal met? no step 2 plan → act → test → review validator: goal met? no … (runs for hours · survives a reboot) step N validator: goal met? YES ✓ stop
Hermes — judge after every turn, with a budget5
/goal fix all lint errors turn 1 …work… judge {done:false,"3 left"} ↻ Continuing toward goal turn 2 …work… judge {done:false,"1 left"} ↻ Continuing toward goal turn 3 …work… judge {done:true,"lint clean"} ✓ Goal achieved (3 turns) …or at the cap: ⏸ Goal paused — 20/20 turns used
Claude Code — a Stop hook gates each turn-end6
/goal tests pass · no console errors turn 1 …work… try to stop → hook: met? no ↻ keep going turn 2 …work… (full memory of turn 1) try to stop → hook: met? no ↻ keep going turn 3 …work… try to stop → hook: met? YES ✓ hook clears itself · agent stops

When to reach for which

Pick Codex /goal when

  • You want a true set-and-forget marathon — hours, overnight
  • The end state is machine-verifiable (tests pass, build green)
  • It must survive a crash or reboot mid-run
  • You don't need to inspect every decision along the way

Pick Hermes /goal when

  • You want bounded iteration with a natural check-in
  • You want to read why the judge called it done or not
  • You'll refine acceptance criteria as you go (/subgoal)
  • You want to pause, resume, and keep a hand on the wheel

Pick Claude Code /goal when

  • You're already working in a Claude Code session
  • You can state the finish line in one plain sentence
  • You want it to keep full context and not quit early
  • You want the simplest mental model — a gate, no extra config
All three are the Ralph loop grown up — they refuse to quit halfway, and they remember what they've tried. The difference is how much rope they give the agent. Codex bets on unattended endurance: own the loop for hours, survive reboots. Hermes adds guardrails — a readable judge, a turn budget, subgoals, pause/resume — for supervised iteration. Claude Code keeps it minimal: a Stop hook that won't let one full-context session end until your condition holds, then clears itself. Same instinct, three settings on the same dial.

References

Each superscript number in the text links here. These back the factual claims about how each tool works; the trade-off lists marked "my reading" and the analogies are interpretation, not sourced. Note: the secondary write-ups below are third-party explainers, not official vendor docs — and they disagree on who personally authored Codex's /goal, so Claude left individual attribution out.

  1. [1] Geoffrey Huntley — “Ralph Wiggum as a ‘software engineer’” (May 2025). The origin of the Ralph loop: rerun the agent on the same prompt with a fresh context each iteration, keeping state in files, until a completion condition is met.
  2. [2] Apidog — “The /goal command: Codex, Claude Code & autonomous agents.” Codex mechanics: a fast validator model checks “has the goal been met?” after every step; the agent owns the plan→act→test→review loop and surfaces only when finished, blocked, or out of budget; structured form do X until Y without Z.
  3. [3] DevToolPicks — “OpenAI just added /goal to Codex CLI.” OpenAI shipped /goal to Codex CLI; state is tied to the Codex session and survives process restarts, reboots, and TUI exits.
  4. [4] MindStudio — “Codex /goal ran a device-driver project for 14 hours.” Andrew Chen (a16z) left Codex /goal on a Mac eGPU device-driver project overnight; ~14 hours later it was still working.
  5. [5] Nous Research — Hermes Agent docs, “Persistent Goals.” Judge model returns strict JSON {done, reason} after each turn (reading ~4 KB), conservative verdicts, default max_turns: 20 then ⏸ pause; /subgoal and pause·resume·clear·status; cheap goal_judge model; state in SessionDB.state_meta; new goal rejected mid-run unless you /stop; explicitly modelled on Codex's /goal.
  6. [6] Claude Code /goal — first-party observation (no public docs page). Running /goal <condition> installs a session-scoped Stop hook that blocks the agent from ending its turn until the stated condition holds, then clears itself automatically. (This very article was written under such a goal hook. Claude was not allowed to stop until it was done.)