The context is full, learn to cut
"The agent can forget strategically and keep working forever." Strategic forgetting = engineering capabilities.
Why compact?
If the agent runs for a long time, messages[] will expand: each read_file will return thousands of tokens, each bash will return hundreds, and each round of dialogue will also include the model’s thinking text. After 50 rounds, the context can be stuffed to 100K+. Two consequences:
- Encountering the upper limit of the model: It collapses when the window size is reached, or the price increases linearly with each API call.
- Attention dilution: Now that the task at hand is drowned in irrelevant tool_results from 30 rounds ago, the model starts to lose focus.
The idea of s06: Let the agent actively forget unimportant content but retain key states. Three-layer mechanism, from light to heavy.
Layer 1 · micro_compact (runs silently every round)
The cheapest tier. Run it before each LLM call and replace more than 3 old tool_results with placeholders:
# Moving forward from the 10th round, most tool_results become: { "type": "tool_result", "tool_use_id": "tolu_01A", "content": "[Previous: used bash]" # Reduced from thousands of characters to dozens }
There is a special case: the result of read_file is not compressed. Why? Because the output of read is a reference material, if you press the model, you have to read it again, which is more expensive.
PRESERVE_RESULT_TOOLS = {"read_file"} # Never compress
Look at micro_compact and press turn to eat the old results
The following steps simulate 10 rounds of interaction, and let micro_compact run once before each round. Look at the old tool_result in messages[] and it becomes [Previous: ...], but the last 3 remain the same.
Layer 2 · auto_compact (triggered when threshold is exceeded)
Even if the micro keeps running, it will still explode when it gets tired to a certain size. s06 sets a threshold (default 50000 token):
- Estimate the number of tokens
len(str(messages)) // 4(rough but sufficient). - Exceeds the threshold → Write the complete transcript to
.transcripts/transcript_TIMESTAMP.jsonl(leave the bottom). - Ask the LLM to write a summary of the entire conversation.
- Replace
messageswith one"[compressed] SUMMARY...".
The cost is obvious - the specific tool output and conversational tone are lost, and only the outline is left. But the agent can continue doing, which is the core benefit.
Layer 3 · Adjust the model yourself compact tool
auto_compact is automatically triggered by the harness and is not known to the model. Layer 3 conversely: Give the model a compact tool and let it actively request compression - for example, if it feels that the previous exploration is useless and a new stage needs to be started.
Model call:
tool_use("compact", focus="keep the API design decisions")
Triggering is the same as auto, but it can take a focus parameter to tell what to focus on when summarizing. It is very practical in actual combat - the model knows which "finished small tasks" are, which is more accurate than the harness' heuristic.
Which layer is appropriate? true or false question
The following scenarios determine which trigger of micro / auto / manual is more reasonable.