Kreuzberg MCP — 安裝 & 即時演示

為什麼要用

核心特性

支援 97 種以上格式——PDF、DOCX、XLSX、PPTX、圖片、HTML、EPUB、RTF
Rust 核心——速度快、記憶體佔用遠低於 Python 方案
內建 OCR（Tesseract／PaddleOCR），支援掃描文件
保留結構——表格轉為 Markdown、標題、清單一應俱全
擷取中繼資料：作者、建立日期、字數、語言

即時演示

實際使用效果

kreuzberg.replay ▶ 就緒

0/0

安裝

選擇你的客戶端

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "kreuzberg": {
      "command": "uvx",
      "args": [
        "kreuzberg-mcp"
      ]
    }
  }
}

開啟 Claude Desktop → Settings → Developer → Edit Config。儲存後重啟應用。

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "kreuzberg": {
      "command": "uvx",
      "args": [
        "kreuzberg-mcp"
      ]
    }
  }
}

Cursor 使用與 Claude Desktop 相同的 mcpServers 格式。專案級設定優先於全域。

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "kreuzberg": {
      "command": "uvx",
      "args": [
        "kreuzberg-mcp"
      ]
    }
  }
}

點擊 Cline 側欄中的 MCP Servers 圖示，然後選 "Edit Configuration"。

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "kreuzberg": {
      "command": "uvx",
      "args": [
        "kreuzberg-mcp"
      ]
    }
  }
}

格式與 Claude Desktop 相同。重啟 Windsurf 生效。

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "kreuzberg",
      "command": "uvx",
      "args": [
        "kreuzberg-mcp"
      ]
    }
  ]
}

Continue 使用伺服器物件陣列，而非映射。

~/.config/zed/settings.json

{
  "context_servers": {
    "kreuzberg": {
      "command": {
        "path": "uvx",
        "args": [
          "kreuzberg-mcp"
        ]
      }
    }
  }
}

加入 context_servers。Zed 儲存後熱重載。

claude mcp add kreuzberg -- uvx kreuzberg-mcp

一行命令搞定。用 claude mcp list 驗證，claude mcp remove 移除。

使用場景

實戰用法： Kreuzberg

從凌亂的 PDF 擷取表格並輸出為乾淨的 Markdown

👤 處理報告 PDF 的分析師 ⏱ ~10 min beginner

何時使用： 你有一份 PDF，其中的表格被 pdftotext 搞爛了，又不想手動重打。

前置條件

已安裝 MCP — uvx kreuzberg-mcp——或透過 claude mcp add 加入

步驟

擷取

Use kreuzberg to extract /docs/2025-annual-report.pdf. Give me the tables as markdown and the body text separately.✓ 已複製

→ 保留標題的乾淨 Markdown 表格
驗證

For the "Revenue by Segment" table, reconcile the column totals. Flag any OCR misreads.✓ 已複製

→ 附有問題儲存格標示的數值核對結果

結果： 可直接貼入文件的 Markdown 表格，無需再次整理。

注意事項

掃描 PDF 的 OCR 把 6 識別成 8 — 參考 OCR 信心度輸出，手動重新確認低信心度的儲存格

搭配使用： filesystem

批次匯入混合格式文件資料夾以供下游索引

👤 建構 RAG 管線的工程師 ⏱ ~30 min intermediate

何時使用： 客戶丟給你一個包含 PDF、Word 文件和 PowerPoint 的壓縮檔，你需要提取乾淨文字以便 embedding。

前置條件

已將 Filesystem MCP 限定在該資料夾 — 以匯入目錄作為根目錄啟動 fs MCP

步驟

盤點

List every file under /ingest/. For each, call kreuzberg.detect_format and report.✓ 已複製

→ 各檔案的格式列表
批次擷取

For each file, extract text + metadata. Write cleaned .txt next to the original and a manifest.json with metadata.✓ 已複製

→ 所有檔案處理完畢；manifest 包含每一筆紀錄
品質檢查

List every file where extraction returned <100 chars — those are likely scanned or corrupt. Re-run with OCR forced.✓ 已複製

→ 識別出低內容檔案並重試

結果： 一個裝滿乾淨文字檔的資料夾，附有中繼資料 manifest，隨時可以進行 embedding。

注意事項

加密的 PDF — Kreuzberg 會回傳錯誤——先用 qpdf 解密，或請求未加密的版本

搭配使用： filesystem · memory

組合

與其他 MCP 搭配，撬動十倍槓桿

kreuzberg + filesystem

遍歷資料夾並就地擷取每份文件

For each PDF under /docs, extract text and save as .md next to it.✓ 已複製

kreuzberg + memory

將擷取內容匯入知識圖譜

Extract /contracts/*.pdf and store key terms in memory for cross-doc querying.✓ 已複製

工具

此 MCP 暴露的能力

工具	輸入參數	何時呼叫	成本
extract_text	path: str, ocr?: bool	主要的擷取呼叫	free
extract_metadata	path: str	只需要中繼資料、不需要本文時	free
extract_tables	path: str	以表格為主的擷取任務	free
detect_format	path: str	擷取前先確認格式	free

成本與限制

運行它的成本

API 配額: 無限制——本機執行
每次呼叫 Token 數: 與文件大小成正比；20 頁 PDF 約產生 8k tokens 的輸出
費用: 免費（開源）
提示: 先對大型檔案執行 extract_metadata，過濾掉不相關的檔案再處理

安全

權限、密鑰、影響範圍

憑證儲存： 本機模式無需任何憑證

資料出站： 無——所有處理均在本機進行

格式異常的 PDF 可能觸發解析器邊界情況——若處理來路不明的上傳檔案，請將 MCP 沙箱化

故障排查

常見錯誤與修復

ModuleNotFoundError: tesseract

安裝 Tesseract 系統二進位：brew install tesseract / apt install tesseract-ocr

驗證： `tesseract --version`

Empty output on PDF

可能是純圖片 PDF——加上 ocr=true 重新執行

驗證： Check output.metadata.has_text_layer

XLSX tables come out jumbled

明確傳入工作表名稱：工具支援 sheet 參數

替代方案

Kreuzberg 對比其他方案

替代方案	何時用它替代	權衡
markdownify-mcp	你想要更輕量的 Node.js 轉換器，不需要 OCR	格式支援較少，不保留表格結構
Unstructured.io	你需要企業級 PDF 解析，且能接受雲端費用	付費；雲端托管

Kreuzberg

為什麼要用

核心特性

即時演示

實際使用效果

安裝

選擇你的客戶端

使用場景

實戰用法： Kreuzberg

從凌亂的 PDF 擷取表格並輸出為乾淨的 Markdown

前置條件

步驟

注意事項

批次匯入混合格式文件資料夾以供下游索引

前置條件

步驟

注意事項

組合

與其他 MCP 搭配，撬動十倍槓桿

工具

此 MCP 暴露的能力

成本與限制

運行它的成本

安全

權限、密鑰、影響範圍

故障排查

常見錯誤與修復

替代方案

Kreuzberg 對比其他方案

更多

資源