Lesson 05 · planning

Domain knowledge loaded on demand

"Don't put everything in the system prompt. Load on demand."

⏱ ~10 min · 📝 3 interactive widgets · 🧑‍💻 Based on shareAI-lab · s05_skill_loading.py

The pitfall of "full system prompt"

You have 20 skills, each of which is written in detail: pdf-processing (how to read PDF), code-review (review checklist), git-workflow (commonly used git routines)... Intuitive method: put them all into the system prompt, so that the model can be consulted at any time.

Result:

  • Burn 15-30K input tokens for each call (even if the problem does not require any skills at all).
  • The model's attention is diluted - the compliance with the rules mentioned in the long system prompt will decrease.
  • Change a skill and the cache of all historical conversations will be invalidated.

The way s05 is done is to split it into two layers.

two-tier architecture

Layer 1 · Cheap: Only the name of the skill and a one-sentence description are placed in the system prompt (about 100 tokens each). 20 skills = 2K tokens, acceptable.

# Skill list in system prompt
Skills available:
  - pdf: Process PDF files. Extract text, tables, metadata.
  - code-review: Systematic code review checklist.
  - git-workflow: Common git branching and rebase patterns.

Layer 2 · On demand: When the model needs to use a certain skill, call load_skill(name="pdf"), and the complete skill body (maybe 5-10K tokens) is inserted into the context through tool_result. None of the tokens for unused skills are loaded.

# tool_result returns the complete skill
<skill name="pdf">
  Step 1: Use pdfplumber for extraction...
  Step 2: Handle OCR fallback when needed...
  Step 3: Structure output as Markdown table...
</skill>

Compare token costs

Test it in a real scenario. Suppose you have 20 skills and each body has an average of 3000 tokens. The user asks a question (such as "Fix the bug in the login interface") - this question probably does not require any skills.

SKILL.md format

Skill files use YAML frontmatter + body:

---
name: pdf
description: Process PDF files. Extract text, tables, metadata.
tags: document,parsing
---

Step 1: Use pdfplumber for extraction. Handle multi-column layouts...
Step 2: For scanned PDFs, fall back to OCR via tesseract...

The frontmatter is for Layer 1 (name/description/tags), and the body is for Layer 2. This writing method is inspired by static blogs (Jekyll, Hugo), and people who are familiar with it can understand it at a glance.

Interactive

Widget 1 · Token Economy · Comparison of two architectures

Left: Full system prompt. Right: Two-tier architecture. Look at the accumulated tokens after 20 conversations.

全塞 system prompt
System prompt: 60000 tokens
(20 × 3000 token 的 skill 全塞入)
× 对话次数: 1

总计: 60000 tokens
两层架构
System prompt: 2000 tokens
(20 个描述 × ~100 token 每个)
+ 按需加载的 skill body: 0 tokens
(每 5 次对话触发一次)

总计: 2000 tokens
1
省 0%
Interactive

Widget 2 · Frontmatter Parser · Extract skill metadata

Enter a SKILL.md and use the YAML frontmatter parsing logic in s05 to see what Layer 1 and Layer 2 get respectively.

SKILL.md(可编辑)
Layer 1 · system prompt 里塞这个

          
Layer 2 · load_skill 时的 tool_result

          
Interactive

Widget 3 · Discoverability · The model can only be found after the skill description is written.

The description of Layer 1 is the basis for selecting skills for the model. Give you 3 sets of comparisons, choose the one with better writing - some writing methods will make the model never find this skill.

答对 0 / 3