Autoresearch (Karpathy-style) (Claude Skill) — 사용 사례, 설치 & 라이브 데모

왜 쓰나요

핵심 기능

목표 + 검증자 추상화 — 모든 측정 가능한 목표에 작동
예산 제어 (최대 반복, 최대 tokens, 시간)
유지/버리기 로그 + 회귀 시 자동 롤백
플러그 가능한 검증자: 지표, 테스트 스위트, LLM 판사
검사를 위한 모든 반복의 마크다운 추적
무료, 오픈

라이브 데모

실제 사용 모습

준비됨

설치

클라이언트 선택

~/Library/Application Support/Claude/claude_desktop_config.json · Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "autoresearch-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/uditgoenka/autoresearch",
        "~/.claude/skills/autoresearch"
      ],
      "_inferred": true
    }
  }
}

Claude Desktop → Settings → Developer → Edit Config 열기. 저장 후 앱 재시작.

~/.cursor/mcp.json · .cursor/mcp.json

{
  "mcpServers": {
    "autoresearch-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/uditgoenka/autoresearch",
        "~/.claude/skills/autoresearch"
      ],
      "_inferred": true
    }
  }
}

Cursor는 Claude Desktop과 동일한 mcpServers 스키마 사용. 프로젝트 설정이 전역보다 우선.

VS Code → Cline → MCP Servers → Edit

{
  "mcpServers": {
    "autoresearch-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/uditgoenka/autoresearch",
        "~/.claude/skills/autoresearch"
      ],
      "_inferred": true
    }
  }
}

Cline 사이드바의 MCP Servers 아이콘 클릭 후 "Edit Configuration" 선택.

~/.codeium/windsurf/mcp_config.json

{
  "mcpServers": {
    "autoresearch-skill": {
      "command": "git",
      "args": [
        "clone",
        "https://github.com/uditgoenka/autoresearch",
        "~/.claude/skills/autoresearch"
      ],
      "_inferred": true
    }
  }
}

Claude Desktop과 같은 형식. Windsurf 재시작 후 적용.

~/.continue/config.json

{
  "mcpServers": [
    {
      "name": "autoresearch-skill",
      "command": "git",
      "args": [
        "clone",
        "https://github.com/uditgoenka/autoresearch",
        "~/.claude/skills/autoresearch"
      ]
    }
  ]
}

Continue는 맵이 아닌 서버 오브젝트 배열 사용.

~/.config/zed/settings.json

{
  "context_servers": {
    "autoresearch-skill": {
      "command": {
        "path": "git",
        "args": [
          "clone",
          "https://github.com/uditgoenka/autoresearch",
          "~/.claude/skills/autoresearch"
        ]
      }
    }
  }
}

context_servers에 추가. 저장 시 Zed가 핫 리로드.

claude mcp add autoresearch-skill -- git clone https://github.com/uditgoenka/autoresearch ~/.claude/skills/autoresearch

한 줄 명령. claude mcp list로 확인, claude mcp remove로 제거.

사용 사례

실전 활용법: Autoresearch (Karpathy-style)

벤치마크에 대해 시스템 프롬프트를 반복적으로 튜닝

👤 프롬프트를 튜닝하는 AI 엔지니어 ⏱ ~90 min advanced

언제 쓸까: 프롬프트, 벤치마크, 루프에 대한 인내심이 있을 때.

사전 조건

스킬 설치 — git clone https://github.com/uditgoenka/autoresearch ~/.claude/skills/autoresearch
점수 함수가 있는 벤치마크 — stdout에 점수를 출력하는 /bench/run.sh

흐름

목표 설정

Use autoresearch. Goal: maximize score from /bench/run.sh on prompt at /prompts/system.md. Budget 30 iterations.✓ 복사됨

→ 루프 시작; 첫 번째 제안 완성
추적 관찰

Show me iterations 5–10 with deltas.✓ 복사됨

→ 반복당 점수가 있는 추적; 유지됨/버려짐 표시
조기 중지

If 3 consecutive iterations fail to improve > 1%, stop and report best.✓ 복사됨

→ 수렴 가드 트리거됨; 최고 프롬프트 보고됨

결과: 이유를 설명하는 추적과 함께 더 좋은 프롬프트.

함정

검증자가 게임 가능 — 품질 없이 점수 증가 — 건전성 확인 검증자 추가 (LLM 판사 또는 보유 세트)

함께 쓰기: filesystem

자동 반복으로 핫 함수에서 20% 성능 짜내기

👤 프로파일 데이터가 있는 백엔드 개발자 ⏱ ~120 min advanced

언제 쓸까: 어떤 함수가 느린지 알고 있을 때; Claude가 더 빠른 등가물을 찾기를 원할 때.

흐름

정의

Goal: minimize wall-time of /bench/perf.sh which exercises foo(). Constraint: tests must keep passing.✓ 복사됨

→ 루프 시작; 기준선 캡처됨
반복

Run 20 iterations. Show the top 3 improvements at the end.✓ 복사됨

→ 측정된 속도 향상이 있는 3개의 후보 리팩토링

결과: 검증된 구체적인 속도 향상.

함정

반복이 테스트가 놓치는 미묘한 정확성 문제를 도입 — 단위 테스트와 함께 검증자로 프로퍼티 기반 테스트를 추가하세요

함께 쓰기: github

CTR 판사에 대해 랜딩 페이지 카피를 자동 반복

👤 콘텐츠 테스트를 진행하는 마케터 ⏱ ~60 min intermediate

언제 쓸까: CTR 목표(또는 이를 시뮬레이션하는 판사 프롬프트)와 반복할 시간이 있을 때.

흐름

판사 설정

Goal: maximize judge_score on /copy/headline.md. Judge prompt: 'rate likelihood a Series-B SaaS founder clicks this headline'.✓ 복사됨

→ 판사 기준선 점수화됨; 루프 시작
반복

Run 15 iterations; keep top 3 distinct candidates.✓ 복사됨

→ 상위 3개의 다른 헤드라인

결과: 인간 검토를 위한 3개의 후보 헤드라인.

함정

판사가 클릭 가능성과 무관한 강한 스타일 선호도 가짐 — 명시적 기준이 있는 루브릭 파일로 판사를 고정하세요

조합

다른 MCP와 조합해 10배 효율

autoresearch-skill + filesystem

검사를 위해 반복 추적 유지

autoresearch-skill + github

우승 후보로 PR 열기

도구

이 MCP가 노출하는 것

도구	입력	언제 호출	비용
loop	goal, verifier, max_iter, budget_tokens?	폐쇄 루프 최적화	Variable — bound by budget
trace	loop_id?	실행 검사	0
rollback	to_iteration	루프가 잘못된 방향으로 진행됨	0

비용 및 제한

운영 비용

API 쿼터: LLM에 따라 결정
호출당 토큰: 무거움 — 전체 루프는 100k+ tokens 가능
금액: 무료; LLM 비용은 사용자 부담
팁: 항상 max_iter와 budget_tokens를 설정하세요 — 개방형 루프는 돈을 태웁니다

보안

권한, 시크릿, 파급범위

자격 증명 저장: 없음

데이터 외부 송신: LLM 제공자에 따라 결정

루프는 비쌀 수 있습니다 — 예산 없이 절대 실행하지 마세요

문제 해결

자주 발생하는 오류와 해결

루프 막힘 — 매 반복마다 같은 제안

제안자의 탐색 온도를 높이거나 다양한 후보로 시드를 제공하세요

검증자가 일관성 없이 실패

검증자 불안정성은 루프를 무효화합니다 — 시드를 고정하고 반복당 N=3으로 검증을 반복하세요

수렴 전에 예산 소진

추적을 검사하세요 — 단조 이득이 계속되면 예산을 높이세요; 그렇지 않으면 검증자 또는 제안자가 병목입니다

대안

Autoresearch (Karpathy-style) 다른 것과 비교

대안	언제 쓰나	단점/장점
wanshuiyin/Auto-claude-code-research-in-sleep (ARIS)	특별히 야간 비동기 ML 연구 루프를 원할 때	ARIS는 ML에 집중되지만 autoresearch는 일반적
스크립트된 반복이 있는 수동 A/B	목표가 작고 일회성일 때	스킬이 오케스트레이션 오버헤드를 제거합니다

Autoresearch (Karpathy-style)