/packages/ai-chat-worker/CLAUDE.md
CLAUDE.md at /packages/ai-chat-worker/CLAUDE.md
Path: packages/ai-chat-worker/CLAUDE.md
ai-chat-worker
Standalone Cloudflare Worker sub-package for the AI chat API. Independent of the Astro docs site.
Tech Stack
- Cloudflare Workers — runtime
- Cloudflare KV — rate limiting storage
- Anthropic Messages API — Claude Haiku via raw
fetch(no SDK) - TypeScript — strict mode,
@cloudflare/workers-types
Commands
pnpm dev— local dev server (requires.dev.varswithANTHROPIC_API_KEY)pnpm run deploy— deploy to Cloudflare Workers via Wranglerpnpm typecheck— TypeScript type checking
Architecture
src/
├── index.ts # Worker entry — routing, validation, CORS
├── audit-log.ts # Audit logging + IP hashing via Web Crypto
├── claude.ts # Claude API call + docs context fetching/caching
├── cors.ts # CORS headers (exposes Retry-After)
├── input-screen.ts # Prompt injection input screening
├── rate-limit.ts # Per-IP rate limiting via KV
└── types.ts # Env, request/response, Claude API types
Request Flow
- CORS preflight →
cors.ts - Method + path check → 405/404
- Hash client IP (SHA-256 via Web Crypto) →
audit-log.ts - JSON parse + message validation → 400 (audit logged as
invalid_input) - Prompt injection screening → 400 (rejects obvious injection attempts before they consume rate limit quota; audit logged)
- Rate limit check → 429 (audit logged as
rate_limit) - Fetch
llms-full.txtfrom docs site (cached in-memory, best-effort) →claude.ts - Call Claude API with hardened system prompt (XML-tagged context + explicit guardrails) → response (audit logged)
Key Design Decisions
- No SDK — raw
fetchto Anthropic API keeps bundle small for Workers - Fail-open rate limiting — KV errors allow requests through (chat availability > strict enforcement)
- Best-effort caching — module-level cache for
llms-full.txtis ephemeral (Workers isolates are recycled) - Rate limit after validation — invalid requests don’t consume the caller’s quota
- Fire-and-forget audit logging — every interaction logged to KV (
audit:prefix) with 7-day TTL; IPs stored as SHA-256 hashes for privacy - Prompt hardening — system prompt uses XML tags for context separation and explicit guardrails against off-topic / injection attempts
- Input screening — regex pre-filter catches common prompt injection patterns before reaching the Claude API
Configuration
See README.md for full setup instructions (vars, secrets, KV namespace).
Conventions
- All responses include CORS headers (including error responses)
- Error responses use
{ error: string }format - Rate limit uses
cf-connecting-ipfor client IP - History capped at 50 messages to limit API cost
- KV keys use bucket pattern:
rate:min:{ip}:{bucket}/rate:day:{ip}:{bucket}with TTL = 2x window - Audit log keys:
audit:{date}:{timestamp-ms}:{random}with 7-day TTL