Add SSE MCP server, comprehensive docs, and OpenCode integration

- Implement SSE mode for MCP server (mcp/skills.py) - Add MCP service to docker-compose.yml on port 3000 - Add uvicorn dependency to mcp/requirements.txt - Create SETUP.md, USAGE.md, OPENCODE-MCP.md - Update README with quick links and MCP section - Remove semantic cache references throughout - Add cross-platform Python MCP setup script to template repo
2026-03-22 23:59:33 -04:00 · 2026-03-22 23:59:33 -04:00 · e346d356e5
commit e346d356e5
parent 95805dfc86
7 changed files with 1085 additions and 83 deletions
--- a/OPENCODE-MCP.md
+++ b/OPENCODE-MCP.md
@ -0,0 +1,118 @@
 # OpenCode MCP Configuration
 OpenCode (open-source alternative to Cursor/Claude) supports MCP servers. This guide shows how to connect it to your AI Skills API MCP server running on `helm`.
 ## Prerequisites
 - AI Skills API stack running on `helm` (includes MCP server on port 3000)
 - OpenCode installed on your local machine
 ## MCP Server Endpoint
 Your MCP server is accessible at:
 ```
 http://helm:3000
 ```
 It exposes two endpoints:
 - `GET /sse` - Server-Sent Events (for client connection)
 - `POST /messages` - JSON-RPC messages
 ## OpenCode Configuration
 OpenCode reads MCP server config from its settings. You need to add an MCP server with the SSE URL.
 ### Configuration JSON
 Add this to your OpenCode MCP configuration (location varies by install):
 ```json
 {
  "mcpServers": {
    "skills": {
      "url": "http://helm:3000"
    }
  }
 }
 ```
 **Note**: Use `"url"` not `"command"` since the server is remote and uses SSE transport.
 ### Where to Put This
 OpenCode typically reads MCP config from:
 - `~/.config/opencode/mcp.json`
 - or in the app settings UI (Preferences → MCP → Add Server → Manual)
 If using a file, create/edit `~/.config/opencode/mcp.json`:
 ```bash
 mkdir -p ~/.config/opencode
 cat > ~/.config/opencode/mcp.json << 'EOF'
 {
  "mcpServers": {
    "skills": {
      "url": "http://helm:3000"
    }
  }
 }
 EOF
 ```
 ### Test Connection
 1. Restart OpenCode (if running)
 2. Open the MCP servers panel/tool
 3. You should see "skills" server listed as connected
 4. Available tools will include:
   - `search_skills`
   - `get_skill`
   - `list_skills`
   - `get_context`
   - `get_conventions`
   - `get_snippets`
   - `get_memory`
   - `add_memory`
   - `create_skill`
 ## Troubleshooting
 ### "Cannot connect to MCP server"
 - Ensure the stack is up: `docker compose -f /path/to/ai-skills-api/docker-compose.yml ps`
 - Check MCP service logs: `docker compose logs mcp`
 - Verify `helm` resolves: `ping helm` or use IP address instead
 - If using IP, change config to `"url": "http://192.168.x.x:3000"`
 ### "Connection refused" or timeout
 - Ensure port 3000 is exposed: `netstat -tuln | grep 3000` on helm
 - Check firewall: helm should accept connections on 3000 from your network
 ### Tools not appearing
 - Wait 10-20 seconds after OpenCode starts for MCP connection to establish
 - Check OpenCode logs for MCP connection errors
 - Verify the skills service is healthy: `docker compose ps` (mcp should be "Up" and healthy)
 ## Using the Tools
 Once connected, you can invoke MCP tools from OpenCode:
 - `get_context(project="/home/user/myapp")` → fetches relevant skills/conventions
 - `search_skills(query="docker compose")` → finds matching skills
 - `create_skill(...)` → adds new skill to the database
 - `add_memory(project, key, content)` → stores learnings
 These calls happen over the network to `helm:3000` and the MCP server forwards requests to the Skills API (`helm:8675` internally).
 ## Security Note
 The MCP server is exposed on your home network without authentication (relies on network trust). If you need auth, we can add a reverse proxy or API key layer.
 ## One-Line Setup Script
 If you're setting up on a new machine, run this from the `agentic-templates` repo:
 ```bash
 ./setup-opencode-mcp.sh
 ```
 It will detect your OpenCode config location and add the MCP server automatically.
--- a/README.md
+++ b/README.md
@ -2,22 +2,14 @@
 Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.
-## Quick Start
+**API available at**: `http://helm:8675`  
 **Interactive docs**: `http://helm:8675/docs`
-```bash
+## Quick Links
 # Copy config file (optional, uses defaults if missing)
 cp config.yaml.example config.yaml  # customize if needed
-# Run with Docker
+- **[Setup Guide](SETUP.md)** - One-time deployment on your server
-docker compose up -d
+- **[Usage Guide](USAGE.md)** - How to integrate with your agents
-
+- **[Template Repository](https://git.bouncypixel.com/helm/agentic-templates)** - Starter kit for new projects
 # Or run locally
 pip install -r requirements.txt
 uvicorn main:app --reload
 ```
 API available at `http://helm:8675`
 Docs at `http://helm:8675/docs`
 ## Key Features
@ -27,24 +19,22 @@ Docs at `http://helm:8675/docs`
 - **Simple API**: RESTful JSON API + MCP server for Claude Desktop
 - **Zero-friction auth**: Optional API key (set-and-forget)
-## Configuration
+## Quick Start (5 minutes)
-Create `config.yaml` (optional) to customize:
+```bash
 # 1. Deploy the service on helm (see SETUP.md for details)
 docker compose up -d
-```yaml
+# 2. Clone the template repo for your agent project
-port: 8675
+git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
-rag:
+cd my-agent
-  max_skills: 3
+cp .env.example .env
-  max_conventions: 2
+docker compose up -d
-  max_snippets: 2
+
-compression:
+# 3. Your agent is now running with context management
  enabled: true
  strategy: "extractive"  # or "ollama" for phi-3-mini
 auth:
  enabled: false  # set to true and change api_key
 ```
-Or use environment variables (see `config.py` for full list).
+See **[SETUP.md](SETUP.md)** for complete deployment instructions and **[USAGE.md](USAGE.md)** for integration patterns.
 ## Endpoints
@ -113,19 +103,21 @@ async def query_llm(prompt, conversation_history, project=None):
 **Expected savings**: 60-80% token reduction vs. sending everything.
 See **[USAGE.md](USAGE.md)** for complete integration patterns, examples, and best practices.
 ## Template Repository
 Want to get started quickly? Use the agent template:
 ```bash
-# Clone the template (on your Forgejo)
+# Clone the template
-git clone git.bouncypixel.com:helm/ai-agent-template.git
+git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
-cd ai-agent-template
+cd my-agent
 cp .env.example .env
 docker compose up -d
 ```
-The template includes a working agent integration and docker-compose setup.
+The template includes a working agent integration and docker-compose setup. See [USAGE.md](USAGE.md) for integration patterns.
 ## How It Works (Architecture)
@ -166,7 +158,7 @@ If you use Claude Desktop, add to your config:
 Available tools:
 - `search_skills`, `get_skill`, `list_skills`
 - `get_context`, `get_conventions`, `get_snippets`
- `check_cache` (deprecated), `get_memory`, `add_memory`, `create_skill`
+- `get_memory`, `add_memory`, `create_skill`
 ## Migration from v1
@ -186,52 +178,4 @@ If you were using the old semantic cache:
 MIT
-## Example Usage
+For detailed usage examples and API reference, see [USAGE.md](USAGE.md) and the interactive docs at `http://helm:8675/docs` when the service is running.
 ### Create a skill
 ```bash
 curl -X POST http://helm:8675/skills \
  -H "Content-Type: application/json" \
  -d '{
    "id": "homelab-docker-compose",
    "name": "Docker Compose Standard",
    "category": "homelab",
    "content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
    "tags": ["docker", "compose", "infrastructure"]
  }'
 ```
 ### Get context bundle
 ```bash
 curl "http://helm:8675/context?project=/home/server/apps/media-server&skills=homelab-docker-compose,react-v2"
 ```
 ### Check cache
 ```bash
 curl -X POST http://helm:8675/cache/lookup \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "How do I configure traefik?",
    "model": "claude-3-opus"
  }'
 ```
 ## Integration Pattern
 In your agent's system prompt or pre-request hook:
 1. Call `GET /context?project={current_project}&skills={skill_ids}`
 2. Inject returned content into the prompt
 3. Before sending to LLM, check `POST /cache/lookup`
 4. After receiving response, optionally `POST /cache/store`
 This avoids re-sending your standards every request and caches repeated queries.
 ## Database
 SQLite database `ai.db` with tables:
 - `skills` - Reusable patterns and instructions
 - `snippets` - Code snippets
 - `conventions` - Project-specific conventions
 - `cache` - LRU cache of LLM responses
 - `memory` - Project memory/notes
--- a/SETUP.md
+++ b/SETUP.md
@ -0,0 +1,394 @@
 # Setup Guide: AI Skills API
 This guide covers exactly how to deploy the AI Skills API on your home server (`helm`) and set up new agent projects.
 ## Prerequisites
 - Docker & Docker Compose installed on `helm`
 - Access to `helm` from your development machine (SSH or local)
 - Optional: Claude Desktop with MCP support
 ## Server Setup (One-Time)
 Deploy the AI Skills API service on your home server.
 ### 1. Clone the Repository
 ```bash
 # On helm (or accessible to docker)
 cd /opt
 git clone ssh://git@helm:222/helm/ai-skills-api.git
 cd ai-skills-api
 ```
 ### 2. Build and Start Services
 ```bash
 # Build and start all services (API + Ollama + MCP)
 docker compose up -d --build
 # Check it's running
 docker compose ps
 # Should show: api, ollama, mcp (all "Up")
 ```
 ### 3. Verify Deployment
 ```bash
 # Health check (from helm)
 curl http://localhost:8675/health
 # Expected response: {"status":"healthy"}
 ```
 ### 4. Configure Optional Settings
 Edit `config.yaml` (creates defaults if missing):
 ```yaml
 port: 8675
 rag:
  max_skills: 3
  max_conventions: 2
  max_snippets: 2
 compression:
  enabled: true
  strategy: "extractive"  # or "ollama" for phi-3-mini
 auth:
  enabled: false  # set to true to require API key
  api_key: "your-secret-key-here"
 ```
 Restart after changes:
 ```bash
 docker compose restart
 ```
 ### 5. (Optional) Enable API Authentication
 If you want auth across your network:
 1. Edit `config.yaml`:
   ```yaml
   auth:
     enabled: true
     api_key: "generate-a-strong-random-key"
   ```
 2. Restart:
   ```bash
   docker compose restart
   ```
 3. Test:
   ```bash
   curl http://helm:8675/health  # Should work (no auth)
   curl http://helm:8675/skills  # Should fail 401 if auth enabled
   curl -H "X-API-Key: your-secret-key-here" http://helm:8675/skills  # Should work
   ```
 **Note**: API is accessible only on your home network (`helm:8675`). No public exposure by default.
 ### Access from Other Machines
 Your agents running on other machines can access the API at `http://helm:8675`.
 ```bash
 # From any machine on your network
 curl http://helm:8675/health
 ```
 If DNS isn't set up, use `helm` directly (should resolve via local network or hosts file).
 ## MCP Server for Claude Desktop / OpenCode
 The stack includes an MCP server that exposes your skills to Claude Desktop or OpenCode via the Model Context Protocol.
 ### What's Running
 - **MCP Server**: SSE mode on `http://helm:3000`
 - Automatically proxies requests to the Skills API (`http://api:8080` internally)
 - Same Docker network, no extra configuration needed
 ### Configure Claude Desktop
 Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
 ```json
 {
  "mcpServers": {
    "skills": {
      "url": "http://helm:3000"
    }
  }
 }
 ```
 Restart Claude. You should see `skills` server connected with tools like `search_skills`, `get_context`, etc.
 ### Configure OpenCode
 See [OPENCODE-MCP.md](OPENCODE-MCP.md) for detailed instructions. In short:
 ```bash
 # Run the setup script from the agentic-templates repo:
 cd ~/projects/agentic-templates
 ./setup-opencode-mcp.sh
 # Or manually create ~/.config/opencode/mcp.json:
 {
  "mcpServers": {
    "skills": {
      "url": "http://helm:3000"
    }
  }
 }
 ```
 ### Test MCP Connection
 ```bash
 # Should hang (SSE stream) if connected
 curl http://helm:3000/sse
 # With API key if auth enabled:
 curl -H "X-API-Key: your-key" http://helm:3000/sse
 ```
 ## Project Setup (Per Project/Session)
 For each new project or AI agent, you'll create an integration that uses the API.
 ### Option A: Use the Template Repository (Recommended)
 We maintain a template repo for quick starts.
 #### 1. Clone the Template
 ```bash
 cd ~/projects  # or wherever you keep projects
 git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
 cd my-agent
 ```
 Or clone directly via SSH:
 ```bash
 git clone ssh://git@helm:222/helm/agentic-templates.git my-agent
 ```
 #### 2. Configure Environment
 Copy `.env.example` to `.env`:
 ```bash
 cp .env.example .env
 ```
 Edit `.env` if needed:
 ```env
 API_URL=http://helm:8675
 API_KEY=  # Only if auth enabled
 PROJECT=/path/to/your/project  # Optional, for context scoping
 ```
 #### 3. Run Your Agent
 ```bash
 # Using Docker Compose (recommended)
 docker compose up -d
 # Or run directly
 pip install -r requirements.txt
 python agent.py
 ```
 The agent will automatically:
 - Fetch relevant skills/conventions via RAG
 - Store decisions in memory
 - Compress conversation when it grows large
 ### Option B: Manual Integration
 If you want to integrate into an existing project:
 1. Install the Python dependency:
 ```bash
 pip install httpx
 ```
 2. Copy the integration pattern from `template/agent.py` (the `get_context`, `compress_messages`, `store_memory` functions).
 3. Add these calls to your agent's workflow:
   - Before each LLM call: `context = await get_context(query, project)`
   - Inject context into system prompt
   - After each response: `await store_memory(project, key, content)`
   - When conversation > 10 messages: `compressed = await compress_messages(conversation)`
 See `USAGE.md` for detailed integration patterns.
 ## Seeding Skills and Conventions
 The API comes with a seed script that adds useful skills.
 ### Run the Seed Script
 ```bash
 cd /opt/ai-skills-api
 python examples/seed-data.py
 ```
 This adds:
 - D&D campaign management skills
 - Infrastructure/Docker skills
 - Code review skills
 - General best practices
 ### Add Custom Skills
 #### Via API:
 ```bash
 curl -X POST http://helm:8675/skills \
  -H "Content-Type: application/json" \
  -d '{
    "id": "my-skill",
    "name": "My Custom Skill",
    "category": "custom",
    "content": "Specific instructions for your agent...",
    "tags": ["keyword1", "keyword2"]
  }'
 ```
 #### Via MCP (Claude Desktop):
 Use the `skills/create_skill` tool directly in Claude.
 #### Via Python:
 ```python
 import httpx
 resp = httpx.post(
    "http://helm:8675/skills",
    json={
        "id": "unique-skill-id",
        "name": "Skill Name",
        "category": "category",
        "content": "Full skill instructions...",
        "tags": ["tag1", "tag2"]
    }
 )
 ```
 ### Add Project Conventions
 Conventions are project-specific (tied to a project path or identifier):
 ```bash
 curl -X POST http://helm:8675/conventions \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Project Conventions",
    "project": "/home/user/myproject",
    "content": "Project-specific coding standards, workflows, etc."
  }'
 ```
 ## Testing Your Setup
 ### 1. Test RAG Context
 ```bash
 curl "http://helm:8675/context/rag?query=docker compose&project=test"
 ```
 Should return JSON with `skills`, `conventions`, `snippets` arrays.
 ### 2. Test Compression
 ```bash
 curl -X POST http://helm:8675/compress \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello!"},
      {"role": "assistant", "content": "Hi there! How can I help?"},
      {"role": "user", "content": "Tell me about Docker."},
      {"role": "assistant", "content": "Docker is a containerization platform..."}
    ]
  }'
 ```
 Should return compressed messages and `tokens_saved`.
 ### 3. Test Memory
 ```bash
 curl -X POST http://helm:8675/memory \
  -H "Content-Type: application/json" \
  -d '{
    "project": "test",
    "key": "decision-123",
    "content": "We decided to use FastAPI for this project"
  }'
 curl "http://helm:8675/memory?project=test"
 ```
 ### 4. Test from Agent Template
 ```bash
 cd ~/projects/my-agent
 docker compose up -d
 docker compose logs -f agent  # Watch the agent start and interact
 ```
 ## Troubleshooting
 ### Service Won't Start
 ```bash
 # Check logs
 docker compose logs ai-skills-api
 # Common issues:
 # - Port 8675 already in use: change port in docker-compose.yml
 # - Permissions: ensure /opt/ai-skills-api is readable
 ```
 ### Ollama Not Pulling Model
 The entrypoint script auto-pulls `phi3:mini` if compression strategy is `ollama`. To force:
 ```bash
 docker compose exec ai-skills-api ollama pull phi3:mini
 ```
 ### Can't Connect from Other Machines
 - Ensure `helm` is reachable on the network (ping `helm`)
 - Check Docker network: `docker network ls` (should have `ai-skills-api_default`)
 - API is bound to `0.0.0.0:8675` inside container - accessible from host and other containers
 ### Auth Errors
 - If you get 401, either disable auth in `config.yaml` or set `API_KEY` in your agent's `.env`
 - Verify: `curl -H "X-API-Key: your-key" http://helm:8675/skills`
 ### High RAG Latency (>10ms)
 - First request after startup will be slower (warming cache)
 - Subsequent queries should be <5ms
 - If still slow, check embedding model load: `docker compose logs ai-skills-api`
 ## Next Steps
 - Read `USAGE.md` for detailed integration patterns and best practices
 - Use the template repo for all new agent projects
 - Add project-specific skills and conventions as you work
 - Monitor logs for token savings
--- a/USAGE.md
+++ b/USAGE.md
@ -0,0 +1,522 @@
 # Usage Guide: AI Skills API
 This guide explains how to use the AI Skills API effectively in your projects and AI agent sessions.
 ## Table of Contents
 1. [Understanding the Integration Pattern](#understanding-the-integration-pattern)
 2. [RAG Context Retrieval](#rag-context-retrieval)
 3. [Conversation Compression](#conversation-compression)
 4. [Project Memory](#project-memory)
 5. [Session Workflow](#session-workflow)
 6. [Managing Skills](#managing-skills)
 7. [Token Accounting](#token-accounting)
 8. [Best Practices](#best-practices)
 9. [Example Implementations](#example-implementations)
 ---
 ## Understanding the Integration Pattern
 The API provides three core capabilities that work together:
 1. **RAG (Retrieval-Augmented Generation)**: Before each LLM call, fetch relevant skills, conventions, and snippets based on your query. This injects relevant context without sending your entire knowledge base every time.
 2. **Compression**: When conversation history grows long (>10 turns), compress old messages into summaries to stay within context windows.
 3. **Memory**: Store decisions, configurations, and learnings per project for future reference.
 **Expected savings**: 60-80% token reduction vs. sending everything.
 ---
 ## RAG Context Retrieval
 ### The `/context/rag` Endpoint
 This is your primary integration point. It returns only the most relevant items from your knowledge base.
 **Request:**
 ```
 GET /context/rag?query={query}&project={project}
 ```
 **Response:**
 ```json
 {
  "skills": [
    {
      "id": "homelab-docker-compose",
      "name": "Docker Compose Standard",
      "category": "homelab",
      "content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
      "relevance_score": 0.89
    }
  ],
  "conventions": [
    {
      "id": "conv-123",
      "name": "React Project Standards",
      "project": "/home/user/my-react-app",
      "content": "Use TypeScript, React 18+, and functional components with hooks.",
      "relevance_score": 0.76
    }
  ],
  "snippets": [
    {
      "id": "snippet-456",
      "name": "FastAPI CORS setup",
      "language": "python",
      "content": "app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], ...)",
      "relevance_score": 0.82
    }
  ]
 }
 ```
 ### How It Works
 - Skills are globally available (your general knowledge base)
 - Conventions are scoped to a project path or identifier (e.g., `/home/user/project1`)
 - Snippets are globally available code examples
 - Relevance scores are cosine similarity (0-1) - items below 0.3 are typically filtered out
 - Limits are configurable (default: 3 skills, 2 conventions, 2 snippets)
 ### Usage Pattern
 ```python
 async def query_with_context(query: str, project: str = None):
    # 1. Fetch context
    context = await get_context(query, project)
    # 2. Build system prompt
    system_prompt = format_context(context)
    # system_prompt now contains:
    # ## Relevant Skills
    # ### Docker Compose Standard (relevance: 0.89)
    # Always use docker-compose v3.8+...
    # ...
    # 3. Inject into LLM call
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": query}
    ]
    response = await llm.chat(messages)
    return response
 ```
 ---
 ## Conversation Compression
 ### The `/compress` Endpoint
 Compresses a list of conversation messages into a shorter representation.
 **Request:**
 ```json
 {
  "messages": [
    {"role": "user", "content": "Hello!"},
    {"role": "assistant", "content": "Hi! How can I help?"},
    {"role": "user", "content": "I need to set up Docker Compose."},
    {"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."},
    ... (up to 20+ messages)
  ]
 }
 ```
 **Response:**
 ```json
 {
  "messages": [
    {"role": "system", "content": "Summary of earlier conversation..."},
    {"role": "user", "content": "I need to set up Docker Compose."},
    {"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."}
  ],
  "tokens_saved": 245
 }
 ```
 ### Compression Strategies
 - **Extractive** (default): Uses LSA summarization to select key sentences. Fast (~100-500ms), no model required.
 - **Ollama**: Uses `phi3:mini` for abstractive summaries. Better quality but slower (~2s). Requires Ollama running.
 **Configure in `config.yaml`:**
 ```yaml
 compression:
  enabled: true
  strategy: "extractive"  # or "ollama"
 ```
 ### Usage Pattern
 ```python
 conversation = []
 async def chat(query):
    # Add user message
    conversation.append({"role": "user", "content": query})
    # Call LLM (with context from RAG)
    response = await llm.chat(conversation)
    conversation.append({"role": "assistant", "content": response})
    # Compress when conversation gets long
    if len(conversation) >= 10:
        compressed = await compress_messages(conversation)
        conversation = compressed["messages"]
        print(f"Saved {compressed['tokens_saved']} tokens")
    return response
 ```
 **Important**: Keep the most recent ~4-6 turns uncompressed. The compression endpoint preserves recent messages and compresses only the older ones.
 ---
 ## Project Memory
 ### The `/memory` Endpoints
 Store and retrieve project-specific knowledge.
 **Store:**
 ```
 POST /memory
 {
  "project": "my-project",
  "key": "architecture-decision-2024-01-15",
  "content": "We chose FastAPI over Flask for async support and automatic OpenAPI docs."
 }
 ```
 **Retrieve:**
 ```
 GET /memory?project=my-project
 ```
 **Update:**
 ```
 PUT /memory/{id}
 ```
 **Delete:**
 ```
 DELETE /memory/{id}
 ```
 ### Usage Pattern
 ```python
 # Store a decision after making it
 await store_memory(
    project="/home/user/myapp",
    key="db-choice",
    content="Using PostgreSQL over MongoDB for relational data integrity"
 )
 # Retrieve past decisions at project start
 resp = httpx.get("http://helm:8675/memory", params={"project": "/home/user/myapp"})
 decisions = resp.json()["entries"]
 # decisions = [{"id": "...", "key": "db-choice", "content": "...", ...}]
 ```
 **When to use memory:**
 - Architecture decisions
 - Configuration choices (API keys, service URLs)
 - Learned preferences ("User likes code examples")
 - Debugging notes ("Issue with CORS on port 8080")
 **When NOT to use memory:**
 - Temporary conversation state (use compression instead)
 - Large codebases (store in skills/snippets instead)
 - Public documentation (should be in skills)
 ---
 ## Session Workflow
 ### Starting a New Session
 1. **Define your project identifier** - a path or unique string:
   ```python
   PROJECT = "/home/user/myapp"  # or "my-discord-bot", "workspace-123"
   ```
 2. **Load past memories** (optional but helpful):
   ```python
   memories = httpx.get("http://helm:8675/memory", params={"project": PROJECT}).json()["entries"]
   # Inject into system prompt or create context from them
   ```
 3. **Begin conversation loop** - for each user query:
   - Call `GET /context/rag?query=...&project=PROJECT`
   - Inject context into LLM prompt
   - Call LLM
   - Store important outputs in memory if they represent decisions/learnings
   - Compress conversation when it reaches ~10 turns
 ### Ending a Session
 - Optionally store a session summary in memory:
  ```python
  await store_memory(PROJECT, "session-summary-2024-01-15", "Completed user auth flow, decided on JWT tokens")
  ```
 - No cleanup needed - conversation state lives in your agent, not the server
 ### Multi-Project Agents
 If your agent works across multiple projects:
 ```python
 # Switch project context mid-conversation
 PROJECT = "/home/user/project1"  # current active project
 # Each project has its own conventions and memories
 context = await get_context(query, project=PROJECT)
 ```
 ---
 ## Managing Skills
 Skills are your reusable knowledge base. Manage them via API, MCP, or the seed script.
 ### Categories
 Group skills by category (e.g., `homelab`, `dnd`, `python`, `devops`). Categories don't affect RAG retrieval but help with organization.
 ### Tags
 Tags are keywords used for **future search** (not currently used by RAG, but planned for enhanced filtering).
 ```json
 {
  "tags": ["docker", "compose", "infrastructure", "production"]
 }
 ```
 ### Best Practices for Skills
 - **Be specific**: "Docker Compose Production Patterns" > "Docker"
 - **Include examples**: Show code snippets in the content
 - **Keep it concise**: 1-3 paragraphs, focus on actionable guidance
 - **Use markdown**: The API preserves formatting for injection into prompts
 - **Version when updating**: If a skill changes significantly, create a new `id` (e.g., `docker-compose-v2`)
 ### Search Skills
 ```
 GET /skills/search?q={query}
 ```
 Returns matching skills by name/content similarity. Useful for manual exploration but not needed in automated agents (use `/context/rag` instead).
 ---
 ## Token Accounting
 ### Count Tokens
 ```
 GET /tokens/count?text={text}
 ```
 Returns the token count (using tiktoken for GPT models, approximations for others).
 **Use this to:**
 - Track compression savings
 - Pre-flight check prompts before sending to LLM
 - Budget token usage per session
 ### Example: Measure RAG Savings
 ```python
 full_context = load_all_skills()  # hypothetical: all your skills text
 full_tokens = count_tokens(full_context)
 rag_context = get_context(query, project)  # only relevant items
 rag_tokens = count_tokens(format_context(rag_context))
 savings_pct = (1 - rag_tokens / full_tokens) * 100
 print(f"RAG saved {savings_pct:.1f}% tokens")
 ```
 ---
 ## Best Practices
 ### 1. Always Use Project Scoping
 Set `project` parameter consistently. Even if you have one main project, use a consistent identifier:
 ```python
 PROJECT = "/home/user/myapp"  # NOT "default" or None
 context = await get_context(query, project=PROJECT)
 ```
 This allows:
 - Project-specific conventions
 - Memory isolation between projects
 - Future per-project analytics
 ### 2. Call RAG Before Every LLM Request
 Even if the query seems unrelated, the cost is negligible (<5ms, ~50 tokens). The knowledge injected often improves responses.
 ### 3. Compress Proactively
 Don't wait until context window is full. Compress at ~10 messages:
 ```python
 if len(conversation) >= 10:
    compressed = await compress_messages(conversation)
    conversation = compressed["messages"]
 ```
 This keeps the compression quality high (summaries are more accurate with fewer messages).
 ### 4. Store Learnings, Not Everything
 Memory is for **decisions** and **facts you want to recall**.
 Don't store:
 - Every user query/response (that's what compression is for)
 - Public documentation (put in skills instead)
 - Transient state (keep in agent memory)
 ### 5. Version Your Skills
 When a skill's guidance changes:
 - **Minor update** (typo, clarification): update the existing skill's `content` in place
 - **Major update** (different approach, breaking change): create a new `id` (e.g., `docker-compose-v2`) and optionally mark the old one as deprecated in its content
 ### 6. Use MCP in Claude Desktop
 If you use Claude Desktop, add the MCP server (see `CLAUDE.md`). This gives you:
 - Direct access to skills via Claude's tool calling
 - No need to implement API calls manually
 - Same token savings within Claude
 ### 7. Monitor Token Savings
 Track metrics:
 ```python
 import time
 from datetime import datetime
 logs = []
 def log_savings(tokens_before, tokens_after, operation):
    logs.append({
        "timestamp": datetime.now().isoformat(),
        "operation": operation,
        "tokens_before": tokens_before,
        "tokens_after": tokens_after,
        "savings": tokens_before - tokens_after
    })
    # Periodically upload or analyze these
 ```
 ---
 ## Example Implementations
 ### Minimal Agent
 ```python
 import asyncio, httpx, os
 API_URL = os.getenv("API_URL", "http://helm:8675")
 PROJECT = os.getenv("PROJECT", "/default")
 async def get_context(query):
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"{API_URL}/context/rag", params={"query": query, "project": PROJECT})
        return resp.json()
 async def chat():
    conv = []
    while True:
        query = input("You: ")
        if query == "quit": break
        # Get context
        ctx = await get_context(query)
        system = format_context(ctx)
        # Call LLM (pseudo)
        response = call_llm(system, conv[-4:], query)
        conv.extend([{"role": "user", "content": query},
                     {"role": "assistant", "content": response}])
        print(f"Assistant: {response}")
 asyncio.run(chat())
 ```
 ### Discord Bot with Context
 ```python
 import discord
 from discord.ext import commands
 import httpx
 bot = commands.Bot(command_prefix="!")
 API_URL = "http://helm:8675"
 PROJECT = "/home/user/discord-bot"
@bot.event
 async def on_message(message):
    if message.author == bot.user:
        return
    # RAG context
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"{API_URL}/context/rag", params={"query": message.content, "project": PROJECT})
        ctx = resp.json()
    # Build prompt
    system_prompt = format_context(ctx) + "\n\nYou are a helpful Discord bot."
    # Respond (using your LLM of choice)
    response = await generate_response(message.content, system_prompt)
    await message.reply(response)
    # Store in memory if it's a decision
    if "decision" in message.content.lower():
        async with httpx.AsyncClient() as client:
            await client.post(f"{API_URL}/memory", json={
                "project": PROJECT,
                "key": f"decision-{discord.utils.utcnow().timestamp()}",
                "content": response[:500]
            })
 bot.run(os.getenv("DISCORD_TOKEN"))
 ```
 ---
 ## Need More Help?
 - **Setup issues**: See `SETUP.md`
 - **Template repo**: Clone `git.bouncypixel.com:helm/agentic-templates.git`
 - **API reference**: Visit `http://helm:8675/docs` when the service is running
 - **MCP tools**: See `CLAUDE.md` for Claude Desktop integration
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -28,3 +28,18 @@ services:
      interval: 30s
      timeout: 10s
      retries: 3
  mcp:
    build:
      context: .
      dockerfile: mcp/Dockerfile
    command: python skills.py
    ports:
      - "3000:3000"
    environment:
      - SKILLS_API_URL=http://api:8080
      - MCP_TRANSPORT=sse
      - MCP_PORT=3000
    depends_on:
      - api
    restart: unless-stopped
--- a/mcp/requirements.txt
+++ b/mcp/requirements.txt
@ -3,3 +3,4 @@ httpx==0.26.0
 python-dotenv==1.0.0
 docker==7.0.0
 psutil==5.9.7
 uvicorn[standard]==0.27.0
--- a/mcp/skills.py
+++ b/mcp/skills.py
@ -1,6 +1,7 @@
 from mcp.server.fastmcp import FastMCP
 import httpx
 import os
 import uvicorn
 mcp = FastMCP("skills")
@ -162,4 +163,11 @@ def create_skill(
 if __name__ == "__main__":
-    mcp.run()
+    transport = os.getenv("MCP_TRANSPORT", "stdio")
    if transport == "sse":
        host = os.getenv("MCP_HOST", "0.0.0.0")
        port = int(os.getenv("MCP_PORT", "3000"))
        mcp.run_sse(host=host, port=port)
    else:
        mcp.run()