Add SSE MCP server, comprehensive docs, and OpenCode integration

- Implement SSE mode for MCP server (mcp/skills.py) - Add MCP service to docker-compose.yml on port 3000 - Add uvicorn dependency to mcp/requirements.txt - Create SETUP.md, USAGE.md, OPENCODE-MCP.md - Update README with quick links and MCP section - Remove semantic cache references throughout - Add cross-platform Python MCP setup script to template repo
2026-03-22 23:59:33 -04:00 · 2026-03-22 23:59:33 -04:00 · e346d356e5
commit e346d356e5
parent 95805dfc86
7 changed files with 1085 additions and 83 deletions
--- a/OPENCODE-MCP.md
+++ b/OPENCODE-MCP.md
@ -0,0 +1,118 @@
+# OpenCode MCP Configuration
+
+OpenCode (open-source alternative to Cursor/Claude) supports MCP servers. This guide shows how to connect it to your AI Skills API MCP server running on `helm`.
+
+## Prerequisites
+
+- AI Skills API stack running on `helm` (includes MCP server on port 3000)
+- OpenCode installed on your local machine
+
+## MCP Server Endpoint
+
+Your MCP server is accessible at:
+```
+http://helm:3000
+```
+
+It exposes two endpoints:
+- `GET /sse` - Server-Sent Events (for client connection)
+- `POST /messages` - JSON-RPC messages
+
+## OpenCode Configuration
+
+OpenCode reads MCP server config from its settings. You need to add an MCP server with the SSE URL.
+
+### Configuration JSON
+
+Add this to your OpenCode MCP configuration (location varies by install):
+
+```json
+{
+  "mcpServers": {
+    "skills": {
+      "url": "http://helm:3000"
+    }
+  }
+}
+```
+
+**Note**: Use `"url"` not `"command"` since the server is remote and uses SSE transport.
+
+### Where to Put This
+
+OpenCode typically reads MCP config from:
+- `~/.config/opencode/mcp.json`
+- or in the app settings UI (Preferences → MCP → Add Server → Manual)
+
+If using a file, create/edit `~/.config/opencode/mcp.json`:
+
+```bash
+mkdir -p ~/.config/opencode
+cat > ~/.config/opencode/mcp.json << 'EOF'
+{
+  "mcpServers": {
+    "skills": {
+      "url": "http://helm:3000"
+    }
+  }
+}
+EOF
+```
+
+### Test Connection
+
+1. Restart OpenCode (if running)
+2. Open the MCP servers panel/tool
+3. You should see "skills" server listed as connected
+4. Available tools will include:
+   - `search_skills`
+   - `get_skill`
+   - `list_skills`
+   - `get_context`
+   - `get_conventions`
+   - `get_snippets`
+   - `get_memory`
+   - `add_memory`
+   - `create_skill`
+
+## Troubleshooting
+
+### "Cannot connect to MCP server"
+- Ensure the stack is up: `docker compose -f /path/to/ai-skills-api/docker-compose.yml ps`
+- Check MCP service logs: `docker compose logs mcp`
+- Verify `helm` resolves: `ping helm` or use IP address instead
+- If using IP, change config to `"url": "http://192.168.x.x:3000"`
+
+### "Connection refused" or timeout
+- Ensure port 3000 is exposed: `netstat -tuln | grep 3000` on helm
+- Check firewall: helm should accept connections on 3000 from your network
+
+### Tools not appearing
+- Wait 10-20 seconds after OpenCode starts for MCP connection to establish
+- Check OpenCode logs for MCP connection errors
+- Verify the skills service is healthy: `docker compose ps` (mcp should be "Up" and healthy)
+
+## Using the Tools
+
+Once connected, you can invoke MCP tools from OpenCode:
+
+- `get_context(project="/home/user/myapp")` → fetches relevant skills/conventions
+- `search_skills(query="docker compose")` → finds matching skills
+- `create_skill(...)` → adds new skill to the database
+- `add_memory(project, key, content)` → stores learnings
+
+These calls happen over the network to `helm:3000` and the MCP server forwards requests to the Skills API (`helm:8675` internally).
+
+## Security Note
+
+The MCP server is exposed on your home network without authentication (relies on network trust). If you need auth, we can add a reverse proxy or API key layer.
+
+## One-Line Setup Script
+
+If you're setting up on a new machine, run this from the `agentic-templates` repo:
+
+```bash
+./setup-opencode-mcp.sh
+```
+
+It will detect your OpenCode config location and add the MCP server automatically.
--- a/README.md
+++ b/README.md
@ -2,22 +2,14 @@

 Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.

-## Quick Start
+**API available at**: `http://helm:8675`  
+**Interactive docs**: `http://helm:8675/docs`

-```bash
-# Copy config file (optional, uses defaults if missing)
-cp config.yaml.example config.yaml  # customize if needed
+## Quick Links

-# Run with Docker
-docker compose up -d
-
-# Or run locally
-pip install -r requirements.txt
-uvicorn main:app --reload
-```
-
-API available at `http://helm:8675`
-Docs at `http://helm:8675/docs`
+- **[Setup Guide](SETUP.md)** - One-time deployment on your server
+- **[Usage Guide](USAGE.md)** - How to integrate with your agents
+- **[Template Repository](https://git.bouncypixel.com/helm/agentic-templates)** - Starter kit for new projects

 ## Key Features

@ -27,24 +19,22 @@ Docs at `http://helm:8675/docs`
 - **Simple API**: RESTful JSON API + MCP server for Claude Desktop
 - **Zero-friction auth**: Optional API key (set-and-forget)

-## Configuration
+## Quick Start (5 minutes)

-Create `config.yaml` (optional) to customize:
+```bash
+# 1. Deploy the service on helm (see SETUP.md for details)
+docker compose up -d

-```yaml
-port: 8675
-rag:
-  max_skills: 3
-  max_conventions: 2
-  max_snippets: 2
-compression:
-  enabled: true
-  strategy: "extractive"  # or "ollama" for phi-3-mini
-auth:
-  enabled: false  # set to true and change api_key
+# 2. Clone the template repo for your agent project
+git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
+cd my-agent
+cp .env.example .env
+docker compose up -d
+
+# 3. Your agent is now running with context management
 ```

-Or use environment variables (see `config.py` for full list).
+See **[SETUP.md](SETUP.md)** for complete deployment instructions and **[USAGE.md](USAGE.md)** for integration patterns.

 ## Endpoints

@ -113,19 +103,21 @@ async def query_llm(prompt, conversation_history, project=None):

 **Expected savings**: 60-80% token reduction vs. sending everything.

+See **[USAGE.md](USAGE.md)** for complete integration patterns, examples, and best practices.
+
 ## Template Repository

 Want to get started quickly? Use the agent template:

 ```bash
-# Clone the template (on your Forgejo)
-git clone git.bouncypixel.com:helm/ai-agent-template.git
-cd ai-agent-template
+# Clone the template
+git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
+cd my-agent
 cp .env.example .env
 docker compose up -d
 ```

-The template includes a working agent integration and docker-compose setup.
+The template includes a working agent integration and docker-compose setup. See [USAGE.md](USAGE.md) for integration patterns.

 ## How It Works (Architecture)

@ -166,7 +158,7 @@ If you use Claude Desktop, add to your config:
 Available tools:
 - `search_skills`, `get_skill`, `list_skills`
 - `get_context`, `get_conventions`, `get_snippets`
- `check_cache` (deprecated), `get_memory`, `add_memory`, `create_skill`
+- `get_memory`, `add_memory`, `create_skill`

 ## Migration from v1

@ -186,52 +178,4 @@ If you were using the old semantic cache:

 MIT

-## Example Usage
-
-### Create a skill
-```bash
-curl -X POST http://helm:8675/skills \
-  -H "Content-Type: application/json" \
-  -d '{
-    "id": "homelab-docker-compose",
-    "name": "Docker Compose Standard",
-    "category": "homelab",
-    "content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
-    "tags": ["docker", "compose", "infrastructure"]
-  }'
-```
-
-### Get context bundle
-```bash
-curl "http://helm:8675/context?project=/home/server/apps/media-server&skills=homelab-docker-compose,react-v2"
-```
-
-### Check cache
-```bash
-curl -X POST http://helm:8675/cache/lookup \
-  -H "Content-Type: application/json" \
-  -d '{
-    "prompt": "How do I configure traefik?",
-    "model": "claude-3-opus"
-  }'
-```
-
-## Integration Pattern
-
-In your agent's system prompt or pre-request hook:
-
-1. Call `GET /context?project={current_project}&skills={skill_ids}`
-2. Inject returned content into the prompt
-3. Before sending to LLM, check `POST /cache/lookup`
-4. After receiving response, optionally `POST /cache/store`
-
-This avoids re-sending your standards every request and caches repeated queries.
-
-## Database
-
-SQLite database `ai.db` with tables:
- `skills` - Reusable patterns and instructions
- `snippets` - Code snippets
- `conventions` - Project-specific conventions
- `cache` - LRU cache of LLM responses
- `memory` - Project memory/notes
+For detailed usage examples and API reference, see [USAGE.md](USAGE.md) and the interactive docs at `http://helm:8675/docs` when the service is running.
--- a/SETUP.md
+++ b/SETUP.md
@ -0,0 +1,394 @@
+# Setup Guide: AI Skills API
+
+This guide covers exactly how to deploy the AI Skills API on your home server (`helm`) and set up new agent projects.
+
+## Prerequisites
+
+- Docker & Docker Compose installed on `helm`
+- Access to `helm` from your development machine (SSH or local)
+- Optional: Claude Desktop with MCP support
+
+## Server Setup (One-Time)
+
+Deploy the AI Skills API service on your home server.
+
+### 1. Clone the Repository
+
+```bash
+# On helm (or accessible to docker)
+cd /opt
+git clone ssh://git@helm:222/helm/ai-skills-api.git
+cd ai-skills-api
+```
+
+### 2. Build and Start Services
+
+```bash
+# Build and start all services (API + Ollama + MCP)
+docker compose up -d --build
+
+# Check it's running
+docker compose ps
+# Should show: api, ollama, mcp (all "Up")
+```
+
+### 3. Verify Deployment
+
+```bash
+# Health check (from helm)
+curl http://localhost:8675/health
+
+# Expected response: {"status":"healthy"}
+```
+
+### 4. Configure Optional Settings
+
+Edit `config.yaml` (creates defaults if missing):
+
+```yaml
+port: 8675
+rag:
+  max_skills: 3
+  max_conventions: 2
+  max_snippets: 2
+compression:
+  enabled: true
+  strategy: "extractive"  # or "ollama" for phi-3-mini
+auth:
+  enabled: false  # set to true to require API key
+  api_key: "your-secret-key-here"
+```
+
+Restart after changes:
+
+```bash
+docker compose restart
+```
+
+### 5. (Optional) Enable API Authentication
+
+If you want auth across your network:
+
+1. Edit `config.yaml`:
+   ```yaml
+   auth:
+     enabled: true
+     api_key: "generate-a-strong-random-key"
+   ```
+
+2. Restart:
+   ```bash
+   docker compose restart
+   ```
+
+3. Test:
+   ```bash
+   curl http://helm:8675/health  # Should work (no auth)
+   curl http://helm:8675/skills  # Should fail 401 if auth enabled
+   curl -H "X-API-Key: your-secret-key-here" http://helm:8675/skills  # Should work
+   ```
+
+**Note**: API is accessible only on your home network (`helm:8675`). No public exposure by default.
+
+### Access from Other Machines
+
+Your agents running on other machines can access the API at `http://helm:8675`.
+
+```bash
+# From any machine on your network
+curl http://helm:8675/health
+```
+
+If DNS isn't set up, use `helm` directly (should resolve via local network or hosts file).
+
+## MCP Server for Claude Desktop / OpenCode
+
+The stack includes an MCP server that exposes your skills to Claude Desktop or OpenCode via the Model Context Protocol.
+
+### What's Running
+
+- **MCP Server**: SSE mode on `http://helm:3000`
+- Automatically proxies requests to the Skills API (`http://api:8080` internally)
+- Same Docker network, no extra configuration needed
+
+### Configure Claude Desktop
+
+Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
+
+```json
+{
+  "mcpServers": {
+    "skills": {
+      "url": "http://helm:3000"
+    }
+  }
+}
+```
+
+Restart Claude. You should see `skills` server connected with tools like `search_skills`, `get_context`, etc.
+
+### Configure OpenCode
+
+See [OPENCODE-MCP.md](OPENCODE-MCP.md) for detailed instructions. In short:
+
+```bash
+# Run the setup script from the agentic-templates repo:
+cd ~/projects/agentic-templates
+./setup-opencode-mcp.sh
+
+# Or manually create ~/.config/opencode/mcp.json:
+{
+  "mcpServers": {
+    "skills": {
+      "url": "http://helm:3000"
+    }
+  }
+}
+```
+
+### Test MCP Connection
+
+```bash
+# Should hang (SSE stream) if connected
+curl http://helm:3000/sse
+
+# With API key if auth enabled:
+curl -H "X-API-Key: your-key" http://helm:3000/sse
+```
+
+## Project Setup (Per Project/Session)
+
+For each new project or AI agent, you'll create an integration that uses the API.
+
+### Option A: Use the Template Repository (Recommended)
+
+We maintain a template repo for quick starts.
+
+#### 1. Clone the Template
+
+```bash
+cd ~/projects  # or wherever you keep projects
+git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
+cd my-agent
+```
+
+Or clone directly via SSH:
+
+```bash
+git clone ssh://git@helm:222/helm/agentic-templates.git my-agent
+```
+
+#### 2. Configure Environment
+
+Copy `.env.example` to `.env`:
+
+```bash
+cp .env.example .env
+```
+
+Edit `.env` if needed:
+
+```env
+API_URL=http://helm:8675
+API_KEY=  # Only if auth enabled
+PROJECT=/path/to/your/project  # Optional, for context scoping
+```
+
+#### 3. Run Your Agent
+
+```bash
+# Using Docker Compose (recommended)
+docker compose up -d
+
+# Or run directly
+pip install -r requirements.txt
+python agent.py
+```
+
+The agent will automatically:
+- Fetch relevant skills/conventions via RAG
+- Store decisions in memory
+- Compress conversation when it grows large
+
+### Option B: Manual Integration
+
+If you want to integrate into an existing project:
+
+1. Install the Python dependency:
+
+```bash
+pip install httpx
+```
+
+2. Copy the integration pattern from `template/agent.py` (the `get_context`, `compress_messages`, `store_memory` functions).
+
+3. Add these calls to your agent's workflow:
+
+   - Before each LLM call: `context = await get_context(query, project)`
+   - Inject context into system prompt
+   - After each response: `await store_memory(project, key, content)`
+   - When conversation > 10 messages: `compressed = await compress_messages(conversation)`
+
+See `USAGE.md` for detailed integration patterns.
+
+## Seeding Skills and Conventions
+
+The API comes with a seed script that adds useful skills.
+
+### Run the Seed Script
+
+```bash
+cd /opt/ai-skills-api
+python examples/seed-data.py
+```
+
+This adds:
+- D&D campaign management skills
+- Infrastructure/Docker skills
+- Code review skills
+- General best practices
+
+### Add Custom Skills
+
+#### Via API:
+
+```bash
+curl -X POST http://helm:8675/skills \
+  -H "Content-Type: application/json" \
+  -d '{
+    "id": "my-skill",
+    "name": "My Custom Skill",
+    "category": "custom",
+    "content": "Specific instructions for your agent...",
+    "tags": ["keyword1", "keyword2"]
+  }'
+```
+
+#### Via MCP (Claude Desktop):
+
+Use the `skills/create_skill` tool directly in Claude.
+
+#### Via Python:
+
+```python
+import httpx
+
+resp = httpx.post(
+    "http://helm:8675/skills",
+    json={
+        "id": "unique-skill-id",
+        "name": "Skill Name",
+        "category": "category",
+        "content": "Full skill instructions...",
+        "tags": ["tag1", "tag2"]
+    }
+)
+```
+
+### Add Project Conventions
+
+Conventions are project-specific (tied to a project path or identifier):
+
+```bash
+curl -X POST http://helm:8675/conventions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "My Project Conventions",
+    "project": "/home/user/myproject",
+    "content": "Project-specific coding standards, workflows, etc."
+  }'
+```
+
+## Testing Your Setup
+
+### 1. Test RAG Context
+
+```bash
+curl "http://helm:8675/context/rag?query=docker compose&project=test"
+```
+
+Should return JSON with `skills`, `conventions`, `snippets` arrays.
+
+### 2. Test Compression
+
+```bash
+curl -X POST http://helm:8675/compress \
+  -H "Content-Type: application/json" \
+  -d '{
+    "messages": [
+      {"role": "user", "content": "Hello!"},
+      {"role": "assistant", "content": "Hi there! How can I help?"},
+      {"role": "user", "content": "Tell me about Docker."},
+      {"role": "assistant", "content": "Docker is a containerization platform..."}
+    ]
+  }'
+```
+
+Should return compressed messages and `tokens_saved`.
+
+### 3. Test Memory
+
+```bash
+curl -X POST http://helm:8675/memory \
+  -H "Content-Type: application/json" \
+  -d '{
+    "project": "test",
+    "key": "decision-123",
+    "content": "We decided to use FastAPI for this project"
+  }'
+
+curl "http://helm:8675/memory?project=test"
+```
+
+### 4. Test from Agent Template
+
+```bash
+cd ~/projects/my-agent
+docker compose up -d
+docker compose logs -f agent  # Watch the agent start and interact
+```
+
+## Troubleshooting
+
+### Service Won't Start
+
+```bash
+# Check logs
+docker compose logs ai-skills-api
+
+# Common issues:
+# - Port 8675 already in use: change port in docker-compose.yml
+# - Permissions: ensure /opt/ai-skills-api is readable
+```
+
+### Ollama Not Pulling Model
+
+The entrypoint script auto-pulls `phi3:mini` if compression strategy is `ollama`. To force:
+
+```bash
+docker compose exec ai-skills-api ollama pull phi3:mini
+```
+
+### Can't Connect from Other Machines
+
+- Ensure `helm` is reachable on the network (ping `helm`)
+- Check Docker network: `docker network ls` (should have `ai-skills-api_default`)
+- API is bound to `0.0.0.0:8675` inside container - accessible from host and other containers
+
+### Auth Errors
+
+- If you get 401, either disable auth in `config.yaml` or set `API_KEY` in your agent's `.env`
+- Verify: `curl -H "X-API-Key: your-key" http://helm:8675/skills`
+
+### High RAG Latency (>10ms)
+
+- First request after startup will be slower (warming cache)
+- Subsequent queries should be <5ms
+- If still slow, check embedding model load: `docker compose logs ai-skills-api`
+
+## Next Steps
+
+- Read `USAGE.md` for detailed integration patterns and best practices
+- Use the template repo for all new agent projects
+- Add project-specific skills and conventions as you work
+- Monitor logs for token savings
--- a/USAGE.md
+++ b/USAGE.md
@ -0,0 +1,522 @@
+# Usage Guide: AI Skills API
+
+This guide explains how to use the AI Skills API effectively in your projects and AI agent sessions.
+
+## Table of Contents
+
+1. [Understanding the Integration Pattern](#understanding-the-integration-pattern)
+2. [RAG Context Retrieval](#rag-context-retrieval)
+3. [Conversation Compression](#conversation-compression)
+4. [Project Memory](#project-memory)
+5. [Session Workflow](#session-workflow)
+6. [Managing Skills](#managing-skills)
+7. [Token Accounting](#token-accounting)
+8. [Best Practices](#best-practices)
+9. [Example Implementations](#example-implementations)
+
+---
+
+## Understanding the Integration Pattern
+
+The API provides three core capabilities that work together:
+
+1. **RAG (Retrieval-Augmented Generation)**: Before each LLM call, fetch relevant skills, conventions, and snippets based on your query. This injects relevant context without sending your entire knowledge base every time.
+
+2. **Compression**: When conversation history grows long (>10 turns), compress old messages into summaries to stay within context windows.
+
+3. **Memory**: Store decisions, configurations, and learnings per project for future reference.
+
+**Expected savings**: 60-80% token reduction vs. sending everything.
+
+---
+
+## RAG Context Retrieval
+
+### The `/context/rag` Endpoint
+
+This is your primary integration point. It returns only the most relevant items from your knowledge base.
+
+**Request:**
+
+```
+GET /context/rag?query={query}&project={project}
+```
+
+**Response:**
+
+```json
+{
+  "skills": [
+    {
+      "id": "homelab-docker-compose",
+      "name": "Docker Compose Standard",
+      "category": "homelab",
+      "content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
+      "relevance_score": 0.89
+    }
+  ],
+  "conventions": [
+    {
+      "id": "conv-123",
+      "name": "React Project Standards",
+      "project": "/home/user/my-react-app",
+      "content": "Use TypeScript, React 18+, and functional components with hooks.",
+      "relevance_score": 0.76
+    }
+  ],
+  "snippets": [
+    {
+      "id": "snippet-456",
+      "name": "FastAPI CORS setup",
+      "language": "python",
+      "content": "app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], ...)",
+      "relevance_score": 0.82
+    }
+  ]
+}
+```
+
+### How It Works
+
+- Skills are globally available (your general knowledge base)
+- Conventions are scoped to a project path or identifier (e.g., `/home/user/project1`)
+- Snippets are globally available code examples
+- Relevance scores are cosine similarity (0-1) - items below 0.3 are typically filtered out
+- Limits are configurable (default: 3 skills, 2 conventions, 2 snippets)
+
+### Usage Pattern
+
+```python
+async def query_with_context(query: str, project: str = None):
+    # 1. Fetch context
+    context = await get_context(query, project)
+    
+    # 2. Build system prompt
+    system_prompt = format_context(context)
+    # system_prompt now contains:
+    # ## Relevant Skills
+    # ### Docker Compose Standard (relevance: 0.89)
+    # Always use docker-compose v3.8+...
+    # ...
+    
+    # 3. Inject into LLM call
+    messages = [
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": query}
+    ]
+    response = await llm.chat(messages)
+    
+    return response
+```
+
+---
+
+## Conversation Compression
+
+### The `/compress` Endpoint
+
+Compresses a list of conversation messages into a shorter representation.
+
+**Request:**
+
+```json
+{
+  "messages": [
+    {"role": "user", "content": "Hello!"},
+    {"role": "assistant", "content": "Hi! How can I help?"},
+    {"role": "user", "content": "I need to set up Docker Compose."},
+    {"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."},
+    ... (up to 20+ messages)
+  ]
+}
+```
+
+**Response:**
+
+```json
+{
+  "messages": [
+    {"role": "system", "content": "Summary of earlier conversation..."},
+    {"role": "user", "content": "I need to set up Docker Compose."},
+    {"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."}
+  ],
+  "tokens_saved": 245
+}
+```
+
+### Compression Strategies
+
+- **Extractive** (default): Uses LSA summarization to select key sentences. Fast (~100-500ms), no model required.
+- **Ollama**: Uses `phi3:mini` for abstractive summaries. Better quality but slower (~2s). Requires Ollama running.
+
+**Configure in `config.yaml`:**
+
+```yaml
+compression:
+  enabled: true
+  strategy: "extractive"  # or "ollama"
+```
+
+### Usage Pattern
+
+```python
+conversation = []
+
+async def chat(query):
+    # Add user message
+    conversation.append({"role": "user", "content": query})
+    
+    # Call LLM (with context from RAG)
+    response = await llm.chat(conversation)
+    conversation.append({"role": "assistant", "content": response})
+    
+    # Compress when conversation gets long
+    if len(conversation) >= 10:
+        compressed = await compress_messages(conversation)
+        conversation = compressed["messages"]
+        print(f"Saved {compressed['tokens_saved']} tokens")
+    
+    return response
+```
+
+**Important**: Keep the most recent ~4-6 turns uncompressed. The compression endpoint preserves recent messages and compresses only the older ones.
+
+---
+
+## Project Memory
+
+### The `/memory` Endpoints
+
+Store and retrieve project-specific knowledge.
+
+**Store:**
+
+```
+POST /memory
+{
+  "project": "my-project",
+  "key": "architecture-decision-2024-01-15",
+  "content": "We chose FastAPI over Flask for async support and automatic OpenAPI docs."
+}
+```
+
+**Retrieve:**
+
+```
+GET /memory?project=my-project
+```
+
+**Update:**
+
+```
+PUT /memory/{id}
+```
+
+**Delete:**
+
+```
+DELETE /memory/{id}
+```
+
+### Usage Pattern
+
+```python
+# Store a decision after making it
+await store_memory(
+    project="/home/user/myapp",
+    key="db-choice",
+    content="Using PostgreSQL over MongoDB for relational data integrity"
+)
+
+# Retrieve past decisions at project start
+resp = httpx.get("http://helm:8675/memory", params={"project": "/home/user/myapp"})
+decisions = resp.json()["entries"]
+# decisions = [{"id": "...", "key": "db-choice", "content": "...", ...}]
+```
+
+**When to use memory:**
+- Architecture decisions
+- Configuration choices (API keys, service URLs)
+- Learned preferences ("User likes code examples")
+- Debugging notes ("Issue with CORS on port 8080")
+
+**When NOT to use memory:**
+- Temporary conversation state (use compression instead)
+- Large codebases (store in skills/snippets instead)
+- Public documentation (should be in skills)
+
+---
+
+## Session Workflow
+
+### Starting a New Session
+
+1. **Define your project identifier** - a path or unique string:
+   ```python
+   PROJECT = "/home/user/myapp"  # or "my-discord-bot", "workspace-123"
+   ```
+
+2. **Load past memories** (optional but helpful):
+   ```python
+   memories = httpx.get("http://helm:8675/memory", params={"project": PROJECT}).json()["entries"]
+   # Inject into system prompt or create context from them
+   ```
+
+3. **Begin conversation loop** - for each user query:
+   - Call `GET /context/rag?query=...&project=PROJECT`
+   - Inject context into LLM prompt
+   - Call LLM
+   - Store important outputs in memory if they represent decisions/learnings
+   - Compress conversation when it reaches ~10 turns
+
+### Ending a Session
+
+- Optionally store a session summary in memory:
+  ```python
+  await store_memory(PROJECT, "session-summary-2024-01-15", "Completed user auth flow, decided on JWT tokens")
+  ```
+
+- No cleanup needed - conversation state lives in your agent, not the server
+
+### Multi-Project Agents
+
+If your agent works across multiple projects:
+
+```python
+# Switch project context mid-conversation
+PROJECT = "/home/user/project1"  # current active project
+
+# Each project has its own conventions and memories
+context = await get_context(query, project=PROJECT)
+```
+
+---
+
+## Managing Skills
+
+Skills are your reusable knowledge base. Manage them via API, MCP, or the seed script.
+
+### Categories
+
+Group skills by category (e.g., `homelab`, `dnd`, `python`, `devops`). Categories don't affect RAG retrieval but help with organization.
+
+### Tags
+
+Tags are keywords used for **future search** (not currently used by RAG, but planned for enhanced filtering).
+
+```json
+{
+  "tags": ["docker", "compose", "infrastructure", "production"]
+}
+```
+
+### Best Practices for Skills
+
+- **Be specific**: "Docker Compose Production Patterns" > "Docker"
+- **Include examples**: Show code snippets in the content
+- **Keep it concise**: 1-3 paragraphs, focus on actionable guidance
+- **Use markdown**: The API preserves formatting for injection into prompts
+- **Version when updating**: If a skill changes significantly, create a new `id` (e.g., `docker-compose-v2`)
+
+### Search Skills
+
+```
+GET /skills/search?q={query}
+```
+
+Returns matching skills by name/content similarity. Useful for manual exploration but not needed in automated agents (use `/context/rag` instead).
+
+---
+
+## Token Accounting
+
+### Count Tokens
+
+```
+GET /tokens/count?text={text}
+```
+
+Returns the token count (using tiktoken for GPT models, approximations for others).
+
+**Use this to:**
+- Track compression savings
+- Pre-flight check prompts before sending to LLM
+- Budget token usage per session
+
+### Example: Measure RAG Savings
+
+```python
+full_context = load_all_skills()  # hypothetical: all your skills text
+full_tokens = count_tokens(full_context)
+
+rag_context = get_context(query, project)  # only relevant items
+rag_tokens = count_tokens(format_context(rag_context))
+
+savings_pct = (1 - rag_tokens / full_tokens) * 100
+print(f"RAG saved {savings_pct:.1f}% tokens")
+```
+
+---
+
+## Best Practices
+
+### 1. Always Use Project Scoping
+
+Set `project` parameter consistently. Even if you have one main project, use a consistent identifier:
+
+```python
+PROJECT = "/home/user/myapp"  # NOT "default" or None
+context = await get_context(query, project=PROJECT)
+```
+
+This allows:
+- Project-specific conventions
+- Memory isolation between projects
+- Future per-project analytics
+
+### 2. Call RAG Before Every LLM Request
+
+Even if the query seems unrelated, the cost is negligible (<5ms, ~50 tokens). The knowledge injected often improves responses.
+
+### 3. Compress Proactively
+
+Don't wait until context window is full. Compress at ~10 messages:
+
+```python
+if len(conversation) >= 10:
+    compressed = await compress_messages(conversation)
+    conversation = compressed["messages"]
+```
+
+This keeps the compression quality high (summaries are more accurate with fewer messages).
+
+### 4. Store Learnings, Not Everything
+
+Memory is for **decisions** and **facts you want to recall**.
+
+Don't store:
+- Every user query/response (that's what compression is for)
+- Public documentation (put in skills instead)
+- Transient state (keep in agent memory)
+
+### 5. Version Your Skills
+
+When a skill's guidance changes:
+
+- **Minor update** (typo, clarification): update the existing skill's `content` in place
+- **Major update** (different approach, breaking change): create a new `id` (e.g., `docker-compose-v2`) and optionally mark the old one as deprecated in its content
+
+### 6. Use MCP in Claude Desktop
+
+If you use Claude Desktop, add the MCP server (see `CLAUDE.md`). This gives you:
+- Direct access to skills via Claude's tool calling
+- No need to implement API calls manually
+- Same token savings within Claude
+
+### 7. Monitor Token Savings
+
+Track metrics:
+
+```python
+import time
+from datetime import datetime
+
+logs = []
+
+def log_savings(tokens_before, tokens_after, operation):
+    logs.append({
+        "timestamp": datetime.now().isoformat(),
+        "operation": operation,
+        "tokens_before": tokens_before,
+        "tokens_after": tokens_after,
+        "savings": tokens_before - tokens_after
+    })
+    # Periodically upload or analyze these
+```
+
+---
+
+## Example Implementations
+
+### Minimal Agent
+
+```python
+import asyncio, httpx, os
+
+API_URL = os.getenv("API_URL", "http://helm:8675")
+PROJECT = os.getenv("PROJECT", "/default")
+
+async def get_context(query):
+    async with httpx.AsyncClient() as client:
+        resp = await client.get(f"{API_URL}/context/rag", params={"query": query, "project": PROJECT})
+        return resp.json()
+
+async def chat():
+    conv = []
+    while True:
+        query = input("You: ")
+        if query == "quit": break
+        
+        # Get context
+        ctx = await get_context(query)
+        system = format_context(ctx)
+        
+        # Call LLM (pseudo)
+        response = call_llm(system, conv[-4:], query)
+        
+        conv.extend([{"role": "user", "content": query},
+                     {"role": "assistant", "content": response}])
+        
+        print(f"Assistant: {response}")
+
+asyncio.run(chat())
+```
+
+### Discord Bot with Context
+
+```python
+import discord
+from discord.ext import commands
+import httpx
+
+bot = commands.Bot(command_prefix="!")
+API_URL = "http://helm:8675"
+PROJECT = "/home/user/discord-bot"
+
+@bot.event
+async def on_message(message):
+    if message.author == bot.user:
+        return
+    
+    # RAG context
+    async with httpx.AsyncClient() as client:
+        resp = await client.get(f"{API_URL}/context/rag", params={"query": message.content, "project": PROJECT})
+        ctx = resp.json()
+    
+    # Build prompt
+    system_prompt = format_context(ctx) + "\n\nYou are a helpful Discord bot."
+    
+    # Respond (using your LLM of choice)
+    response = await generate_response(message.content, system_prompt)
+    await message.reply(response)
+    
+    # Store in memory if it's a decision
+    if "decision" in message.content.lower():
+        async with httpx.AsyncClient() as client:
+            await client.post(f"{API_URL}/memory", json={
+                "project": PROJECT,
+                "key": f"decision-{discord.utils.utcnow().timestamp()}",
+                "content": response[:500]
+            })
+
+bot.run(os.getenv("DISCORD_TOKEN"))
+```
+
+---
+
+## Need More Help?
+
+- **Setup issues**: See `SETUP.md`
+- **Template repo**: Clone `git.bouncypixel.com:helm/agentic-templates.git`
+- **API reference**: Visit `http://helm:8675/docs` when the service is running
+- **MCP tools**: See `CLAUDE.md` for Claude Desktop integration
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -28,3 +28,18 @@ services:
      interval: 30s
      timeout: 10s
      retries: 3
+
+  mcp:
+    build:
+      context: .
+      dockerfile: mcp/Dockerfile
+    command: python skills.py
+    ports:
+      - "3000:3000"
+    environment:
+      - SKILLS_API_URL=http://api:8080
+      - MCP_TRANSPORT=sse
+      - MCP_PORT=3000
+    depends_on:
+      - api
+    restart: unless-stopped
--- a/mcp/requirements.txt
+++ b/mcp/requirements.txt
@ -3,3 +3,4 @@ httpx==0.26.0
 python-dotenv==1.0.0
 docker==7.0.0
 psutil==5.9.7
+uvicorn[standard]==0.27.0
--- a/mcp/skills.py
+++ b/mcp/skills.py
@ -1,6 +1,7 @@
 from mcp.server.fastmcp import FastMCP
 import httpx
 import os
+import uvicorn

 mcp = FastMCP("skills")

@ -162,4 +163,11 @@ def create_skill(


 if __name__ == "__main__":
-    mcp.run()
+    transport = os.getenv("MCP_TRANSPORT", "stdio")
+    
+    if transport == "sse":
+        host = os.getenv("MCP_HOST", "0.0.0.0")
+        port = int(os.getenv("MCP_PORT", "3000"))
+        mcp.run_sse(host=host, port=port)
+    else:
+        mcp.run()