Lesson 03 · planning

Let the agent manage its own progress

"The agent can track its own progress — and I can see it." Let the model make its own list, and then use a small mechanism to let it remember to update the list.

⏱ ~10 min · 📝 3 interactive widgets · 🧑‍💻 Based on shareAI-lab · s03_todo_write.py

structured self-planning

Claude Code often has to do several steps when working: grep to find references → read a few files → change the code → run tests. If you let the model "advance by feel", you will see that it does well in the first few steps, then starts to forget in the middle, and finally gives up halfway.

The solution to s03 is to give it a manifest tool: the model adjusts itself todo The tool inserts task items into it, TodoManager verifies the structure, persists, and returns the current view. This has two advantages:

  • The model is forced to explicit "what to do" - just writing it out helps it clarify its thinking.
  • Humans can see what it is thinking. The debugging experience is ten times better.
# TODO view, each item is structured
[ ] #1: grep "TODO" across src/
[>] #2: read src/app.py and list comments # in progress
[ ] #3: generate summary markdown
[ ] #4: write to TODO_LIST.md

(0/4 completed)

A hard rule: there is only one in_progress at a time

There is a check in TodoManager.update():

if in_progress_count > 1:
    raise ValueError("Only one task can be in_progress at a time")

It may seem harsh, but it is actually helping the model. If you allow 3 "doings" at the same time, it will start fighting everywhere and never finish any of them. Forced single mission advancement, it is forced to complete each one before opening the next one.

The widget below allows you to act as a model, submit various todo payloads, and see which ones pass verification and which ones are rejected.

Nag reminder: No updates for 3 consecutive rounds? poke it

Even if the todo tool is given, the model will still occasionally "forget" to update the list - it has done a lot of things, but in_progress is still stuck in item 2. The move in s03 is a very simple counter:

rounds_since_todo = 0
while True:
    response = LLM(messages, tools)
    ...
    used_todo = any(b.name == "todo" for b in tool_uses)
    rounds_since_todo = 0 if used_todo else rounds_since_todo + 1
    if rounds_since_todo >= 3:
        results.append({"type":"text", "text":"<reminder>Update your todos.</reminder>"})

When the count reaches 3, include a reminder in the next round of user messages. When the model sees it, it will instinctively adjust the todo. This is to use engineering means to turn a soft constraint ("Please keep it updated") into a forced stimulus.

What's the name of this routine?

In agent design circles, this is called structured self-planning with soft nudges—giving the model a structured state that it must write to, supplemented by opportunistic reminders. Claude Code uses a similar pattern in real code, but more restrained (low frequency, neutral wording).

Why not just write "update todo at every step" into the system prompt? Writing is possible, but the model's obedience to the general instructions in the system prompt will decrease as the conversation lengthens. Splitting the instructions into "reminders that are constantly re-injected" is much more stable.
Interactive

Widget 1 · Kanban · todo press turn to evolve

Click Step to see how the model pushes the task from pending to in_progress to completed. Note that there is always only one in_progress in each round.

[ ] pending
[>] in_progress
[x] completed
准备开始…
Interactive

Widget 2 · Validation · Which of the 5 todo payloads passed?

Pass an items array when the model calls the todo tool. TodoManager will run a series of verifications: text is not empty, status is legal, at most one in_progress, and total number ≤ 20. Click to determine whether each payload is passed or rejected.

答对 0 / 5
Interactive

Widget 3 · Nag Counter · What will happen if the todo is not updated for 3 consecutive rounds?

Click Next Turn to see if the counter triggers reminder injection. Each round randomly selects "call todo" or "do not call" - the performance of the model in reality also fluctuates in this way.

rounds_since_todo: 0