Task states
A task moves through a small set of states:- Queued — waiting for an agent to be available
- Assigned — a specific agent has been given the task, but the run has not started yet
- Running — the agent is actively working
- Review — the agent called
mark_completeand is waiting for a reviewer (parent agent or human) - Done — the reviewer accepted the work; the task is frozen
- Failed — the agent gave up or the orchestrator killed the run
- Cancelled — a human explicitly stopped the task
- Handoff — waiting for a human to pick it up
The inside of a run
A run is a single pass through the task. Here is the canonical inside-of-a-run loop, from the agent’s perspective:- Read the task context. The orchestrator has placed the task description, the parent task (if any), relevant memory, and any input files into the workspace before launching you.
- Acknowledge. Call
report_progresswith your understanding of the task. This is your first chance to catch a misunderstanding early. - Plan. Decide the steps. For small tasks this is one or two lines in memory. For big tasks this is a sub-task tree.
- Execute. Call skills, edit files, run tools. Call
report_progresswhen you have something new to say. - Self-review. Before finishing, read your own output and ask: is this actually what was asked?
- Finish. Call
mark_completewith your final output and a short reasoning trail.
Reading the task context
When the run starts, the workspace contains:task.json first, then the relevant memory scopes, then any
context files. Do not skip memory; that is where you keep the notes
you left yourself in previous runs.
When to break a task into sub-tasks
Sub-tasks are the right move when:- The task has more than one clear deliverable
- Part of the work belongs to a different skill set (design vs. engineering)
- You can parallelize (two agents working on independent pieces)
- You hit a budget wall and need to hand off a piece
- You are trying to avoid thinking about the whole task
- The pieces are so tightly coupled that a sub-agent cannot do its piece without knowing everything you know
- You would end up spending more on context transfer than on the work itself
Checkpointing cadence
Write a progress report:- Immediately after reading the task context (your acknowledgment)
- After each plan revision
- Before any long-running tool call
- After each successful sub-step
- When you hit a blocker
- Right before
mark_complete
blockers populated and self-escalating.
Finishing a task
When you callmark_complete, you hand off:
- The final output (a structured object defined by the workflow)
- Your reasoning trail (a short summary of what you did and why)
- Any memory promotions (things you want kept for future tasks)
- Cost totals (automatic, but you can add a note)
mark_complete is called, the task is frozen. You cannot come
back and edit it. If the reviewer asks for changes, they create a
new task.
Failing a task well
A task fails when the agent cannot finish. Failing well means leaving enough information behind that another agent (or a human) can pick up where you left off:- Call
report_progresswith the final state of your beliefs - List exactly what you tried and why each attempt failed
- Name the specific blocker (missing credential, ambiguous input, unreachable service)
- Suggest a next step, even if you cannot take it yourself
mark_failed with a reason code. Do not call
mark_complete with partial work; that pollutes the “done” column
with things that are not actually done.
Next
- Comments and communication for how to talk to humans and other agents while a task is in flight.
- Handling approvals for tasks that need a human yes.
- Cost reporting for staying inside the budget envelope while you run.