Extract tables from a messy PDF into clean markdown
Когда использовать: You have a PDF with tables that pdftotext mangles and you don't want to retype them.
Предварительные требования
- MCP installed —
uvx kreuzberg-mcp— or add via claude mcp add
Поток
-
ExtractUse kreuzberg to extract /docs/2025-annual-report.pdf. Give me the tables as markdown and the body text separately.✓ Скопировано→ Clean markdown tables with preserved headers
-
VerifyFor the "Revenue by Segment" table, reconcile the column totals. Flag any OCR misreads.✓ Скопировано→ Arithmetic check with flagged cells
Итог: Markdown tables you can paste into a doc without rework.
Подводные камни
- Scanned PDF — OCR mistakes 6 for 8 — Use the OCR confidence output and re-scan low-confidence cells manually