S05 · Skill Loading — Learn Claude Code

The pitfall of "full system prompt"

You have 20 skills, each of which is written in detail: pdf-processing (how to read PDF), code-review (review checklist), git-workflow (commonly used git routines)... Intuitive method: put them all into the system prompt, so that the model can be consulted at any time.

Result:

Burn 15-30K input tokens for each call (even if the problem does not require any skills at all).
The model's attention is diluted - the compliance with the rules mentioned in the long system prompt will decrease.
Change a skill and the cache of all historical conversations will be invalidated.

The way s05 is done is to split it into two layers.

two-tier architecture

Layer 1 · Cheap: Only the name of the skill and a one-sentence description are placed in the system prompt (about 100 tokens each). 20 skills = 2K tokens, acceptable.

# Skill list in system prompt
Skills available:
  - pdf: Process PDF files. Extract text, tables, metadata.
  - code-review: Systematic code review checklist.
  - git-workflow: Common git branching and rebase patterns.

Layer 2 · On demand: When the model needs to use a certain skill, call load_skill(name="pdf"), and the complete skill body (maybe 5-10K tokens) is inserted into the context through tool_result. None of the tokens for unused skills are loaded.

# tool_result returns the complete skill
<skill name="pdf">
  Step 1: Use pdfplumber for extraction...
  Step 2: Handle OCR fallback when needed...
  Step 3: Structure output as Markdown table...
</skill>

Compare token costs

Test it in a real scenario. Suppose you have 20 skills and each body has an average of 3000 tokens. The user asks a question (such as "Fix the bug in the login interface") - this question probably does not require any skills.

SKILL.md format

Skill files use YAML frontmatter + body:

---
name: pdf
description: Process PDF files. Extract text, tables, metadata.
tags: document,parsing
---

Step 1: Use pdfplumber for extraction. Handle multi-column layouts...
Step 2: For scanned PDFs, fall back to OCR via tesseract...

The frontmatter is for Layer 1 (name/description/tags), and the body is for Layer 2. This writing method is inspired by static blogs (Jekyll, Hugo), and people who are familiar with it can understand it at a glance.

全塞 system prompt

System prompt: 60000 tokens
（20 × 3000 token 的 skill 全塞入）
× 对话次数: 1

总计: 60000 tokens

两层架构

System prompt: 2000 tokens
（20 个描述 × ~100 token 每个）
+ 按需加载的 skill body: 0 tokens
（每 5 次对话触发一次）

总计: 2000 tokens

对话次数 N: 1

省 0%

SKILL.md（可编辑）

Layer 1 · system prompt 里塞这个

Layer 2 · load_skill 时的 tool_result

Domain knowledge loaded on demand

The pitfall of "full system prompt"

two-tier architecture

Compare token costs

SKILL.md format

Widget 1 · Token Economy · Comparison of two architectures

Widget 2 · Frontmatter Parser · Extract skill metadata

Widget 3 · Discoverability · The model can only be found after the skill description is written.