Domain knowledge loaded on demand
"Don't put everything in the system prompt. Load on demand."
The pitfall of "full system prompt"
You have 20 skills, each of which is written in detail: pdf-processing (how to read PDF), code-review (review checklist), git-workflow (commonly used git routines)... Intuitive method: put them all into the system prompt, so that the model can be consulted at any time.
Result:
- Burn 15-30K input tokens for each call (even if the problem does not require any skills at all).
- The model's attention is diluted - the compliance with the rules mentioned in the long system prompt will decrease.
- Change a skill and the cache of all historical conversations will be invalidated.
The way s05 is done is to split it into two layers.
two-tier architecture
Layer 1 · Cheap: Only the name of the skill and a one-sentence description are placed in the system prompt (about 100 tokens each). 20 skills = 2K tokens, acceptable.
# Skill list in system prompt
Skills available:
- pdf: Process PDF files. Extract text, tables, metadata.
- code-review: Systematic code review checklist.
- git-workflow: Common git branching and rebase patterns.
Layer 2 · On demand: When the model needs to use a certain skill, call load_skill(name="pdf"), and the complete skill body (maybe 5-10K tokens) is inserted into the context through tool_result. None of the tokens for unused skills are loaded.
# tool_result returns the complete skill
<skill name="pdf">
Step 1: Use pdfplumber for extraction...
Step 2: Handle OCR fallback when needed...
Step 3: Structure output as Markdown table...
</skill>
Compare token costs
Test it in a real scenario. Suppose you have 20 skills and each body has an average of 3000 tokens. The user asks a question (such as "Fix the bug in the login interface") - this question probably does not require any skills.
SKILL.md format
Skill files use YAML frontmatter + body:
--- name: pdf description: Process PDF files. Extract text, tables, metadata. tags: document,parsing --- Step 1: Use pdfplumber for extraction. Handle multi-column layouts... Step 2: For scanned PDFs, fall back to OCR via tesseract...
The frontmatter is for Layer 1 (name/description/tags), and the body is for Layer 2. This writing method is inspired by static blogs (Jekyll, Hugo), and people who are familiar with it can understand it at a glance.