diff --git a/OPENCODE-MCP.md b/OPENCODE-MCP.md new file mode 100644 index 0000000..f16c161 --- /dev/null +++ b/OPENCODE-MCP.md @@ -0,0 +1,118 @@ +# OpenCode MCP Configuration + +OpenCode (open-source alternative to Cursor/Claude) supports MCP servers. This guide shows how to connect it to your AI Skills API MCP server running on `helm`. + +## Prerequisites + +- AI Skills API stack running on `helm` (includes MCP server on port 3000) +- OpenCode installed on your local machine + +## MCP Server Endpoint + +Your MCP server is accessible at: +``` +http://helm:3000 +``` + +It exposes two endpoints: +- `GET /sse` - Server-Sent Events (for client connection) +- `POST /messages` - JSON-RPC messages + +## OpenCode Configuration + +OpenCode reads MCP server config from its settings. You need to add an MCP server with the SSE URL. + +### Configuration JSON + +Add this to your OpenCode MCP configuration (location varies by install): + +```json +{ + "mcpServers": { + "skills": { + "url": "http://helm:3000" + } + } +} +``` + +**Note**: Use `"url"` not `"command"` since the server is remote and uses SSE transport. + +### Where to Put This + +OpenCode typically reads MCP config from: +- `~/.config/opencode/mcp.json` +- or in the app settings UI (Preferences → MCP → Add Server → Manual) + +If using a file, create/edit `~/.config/opencode/mcp.json`: + +```bash +mkdir -p ~/.config/opencode +cat > ~/.config/opencode/mcp.json << 'EOF' +{ + "mcpServers": { + "skills": { + "url": "http://helm:3000" + } + } +} +EOF +``` + +### Test Connection + +1. Restart OpenCode (if running) +2. Open the MCP servers panel/tool +3. You should see "skills" server listed as connected +4. Available tools will include: + - `search_skills` + - `get_skill` + - `list_skills` + - `get_context` + - `get_conventions` + - `get_snippets` + - `get_memory` + - `add_memory` + - `create_skill` + +## Troubleshooting + +### "Cannot connect to MCP server" +- Ensure the stack is up: `docker compose -f /path/to/ai-skills-api/docker-compose.yml ps` +- Check MCP service logs: `docker compose logs mcp` +- Verify `helm` resolves: `ping helm` or use IP address instead +- If using IP, change config to `"url": "http://192.168.x.x:3000"` + +### "Connection refused" or timeout +- Ensure port 3000 is exposed: `netstat -tuln | grep 3000` on helm +- Check firewall: helm should accept connections on 3000 from your network + +### Tools not appearing +- Wait 10-20 seconds after OpenCode starts for MCP connection to establish +- Check OpenCode logs for MCP connection errors +- Verify the skills service is healthy: `docker compose ps` (mcp should be "Up" and healthy) + +## Using the Tools + +Once connected, you can invoke MCP tools from OpenCode: + +- `get_context(project="/home/user/myapp")` → fetches relevant skills/conventions +- `search_skills(query="docker compose")` → finds matching skills +- `create_skill(...)` → adds new skill to the database +- `add_memory(project, key, content)` → stores learnings + +These calls happen over the network to `helm:3000` and the MCP server forwards requests to the Skills API (`helm:8675` internally). + +## Security Note + +The MCP server is exposed on your home network without authentication (relies on network trust). If you need auth, we can add a reverse proxy or API key layer. + +## One-Line Setup Script + +If you're setting up on a new machine, run this from the `agentic-templates` repo: + +```bash +./setup-opencode-mcp.sh +``` + +It will detect your OpenCode config location and add the MCP server automatically. diff --git a/README.md b/README.md index a8fae5f..c44e536 100644 --- a/README.md +++ b/README.md @@ -2,22 +2,14 @@ Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills. -## Quick Start +**API available at**: `http://helm:8675` +**Interactive docs**: `http://helm:8675/docs` -```bash -# Copy config file (optional, uses defaults if missing) -cp config.yaml.example config.yaml # customize if needed +## Quick Links -# Run with Docker -docker compose up -d - -# Or run locally -pip install -r requirements.txt -uvicorn main:app --reload -``` - -API available at `http://helm:8675` -Docs at `http://helm:8675/docs` +- **[Setup Guide](SETUP.md)** - One-time deployment on your server +- **[Usage Guide](USAGE.md)** - How to integrate with your agents +- **[Template Repository](https://git.bouncypixel.com/helm/agentic-templates)** - Starter kit for new projects ## Key Features @@ -27,24 +19,22 @@ Docs at `http://helm:8675/docs` - **Simple API**: RESTful JSON API + MCP server for Claude Desktop - **Zero-friction auth**: Optional API key (set-and-forget) -## Configuration +## Quick Start (5 minutes) -Create `config.yaml` (optional) to customize: +```bash +# 1. Deploy the service on helm (see SETUP.md for details) +docker compose up -d -```yaml -port: 8675 -rag: - max_skills: 3 - max_conventions: 2 - max_snippets: 2 -compression: - enabled: true - strategy: "extractive" # or "ollama" for phi-3-mini -auth: - enabled: false # set to true and change api_key +# 2. Clone the template repo for your agent project +git clone git.bouncypixel.com:helm/agentic-templates.git my-agent +cd my-agent +cp .env.example .env +docker compose up -d + +# 3. Your agent is now running with context management ``` -Or use environment variables (see `config.py` for full list). +See **[SETUP.md](SETUP.md)** for complete deployment instructions and **[USAGE.md](USAGE.md)** for integration patterns. ## Endpoints @@ -113,19 +103,21 @@ async def query_llm(prompt, conversation_history, project=None): **Expected savings**: 60-80% token reduction vs. sending everything. +See **[USAGE.md](USAGE.md)** for complete integration patterns, examples, and best practices. + ## Template Repository Want to get started quickly? Use the agent template: ```bash -# Clone the template (on your Forgejo) -git clone git.bouncypixel.com:helm/ai-agent-template.git -cd ai-agent-template +# Clone the template +git clone git.bouncypixel.com:helm/agentic-templates.git my-agent +cd my-agent cp .env.example .env docker compose up -d ``` -The template includes a working agent integration and docker-compose setup. +The template includes a working agent integration and docker-compose setup. See [USAGE.md](USAGE.md) for integration patterns. ## How It Works (Architecture) @@ -166,7 +158,7 @@ If you use Claude Desktop, add to your config: Available tools: - `search_skills`, `get_skill`, `list_skills` - `get_context`, `get_conventions`, `get_snippets` -- `check_cache` (deprecated), `get_memory`, `add_memory`, `create_skill` +- `get_memory`, `add_memory`, `create_skill` ## Migration from v1 @@ -186,52 +178,4 @@ If you were using the old semantic cache: MIT -## Example Usage - -### Create a skill -```bash -curl -X POST http://helm:8675/skills \ - -H "Content-Type: application/json" \ - -d '{ - "id": "homelab-docker-compose", - "name": "Docker Compose Standard", - "category": "homelab", - "content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.", - "tags": ["docker", "compose", "infrastructure"] - }' -``` - -### Get context bundle -```bash -curl "http://helm:8675/context?project=/home/server/apps/media-server&skills=homelab-docker-compose,react-v2" -``` - -### Check cache -```bash -curl -X POST http://helm:8675/cache/lookup \ - -H "Content-Type: application/json" \ - -d '{ - "prompt": "How do I configure traefik?", - "model": "claude-3-opus" - }' -``` - -## Integration Pattern - -In your agent's system prompt or pre-request hook: - -1. Call `GET /context?project={current_project}&skills={skill_ids}` -2. Inject returned content into the prompt -3. Before sending to LLM, check `POST /cache/lookup` -4. After receiving response, optionally `POST /cache/store` - -This avoids re-sending your standards every request and caches repeated queries. - -## Database - -SQLite database `ai.db` with tables: -- `skills` - Reusable patterns and instructions -- `snippets` - Code snippets -- `conventions` - Project-specific conventions -- `cache` - LRU cache of LLM responses -- `memory` - Project memory/notes +For detailed usage examples and API reference, see [USAGE.md](USAGE.md) and the interactive docs at `http://helm:8675/docs` when the service is running. diff --git a/SETUP.md b/SETUP.md new file mode 100644 index 0000000..32dcfcc --- /dev/null +++ b/SETUP.md @@ -0,0 +1,394 @@ +# Setup Guide: AI Skills API + +This guide covers exactly how to deploy the AI Skills API on your home server (`helm`) and set up new agent projects. + +## Prerequisites + +- Docker & Docker Compose installed on `helm` +- Access to `helm` from your development machine (SSH or local) +- Optional: Claude Desktop with MCP support + +## Server Setup (One-Time) + +Deploy the AI Skills API service on your home server. + +### 1. Clone the Repository + +```bash +# On helm (or accessible to docker) +cd /opt +git clone ssh://git@helm:222/helm/ai-skills-api.git +cd ai-skills-api +``` + +### 2. Build and Start Services + +```bash +# Build and start all services (API + Ollama + MCP) +docker compose up -d --build + +# Check it's running +docker compose ps +# Should show: api, ollama, mcp (all "Up") +``` + +### 3. Verify Deployment + +```bash +# Health check (from helm) +curl http://localhost:8675/health + +# Expected response: {"status":"healthy"} +``` + +### 4. Configure Optional Settings + +Edit `config.yaml` (creates defaults if missing): + +```yaml +port: 8675 +rag: + max_skills: 3 + max_conventions: 2 + max_snippets: 2 +compression: + enabled: true + strategy: "extractive" # or "ollama" for phi-3-mini +auth: + enabled: false # set to true to require API key + api_key: "your-secret-key-here" +``` + +Restart after changes: + +```bash +docker compose restart +``` + +### 5. (Optional) Enable API Authentication + +If you want auth across your network: + +1. Edit `config.yaml`: + ```yaml + auth: + enabled: true + api_key: "generate-a-strong-random-key" + ``` + +2. Restart: + ```bash + docker compose restart + ``` + +3. Test: + ```bash + curl http://helm:8675/health # Should work (no auth) + curl http://helm:8675/skills # Should fail 401 if auth enabled + curl -H "X-API-Key: your-secret-key-here" http://helm:8675/skills # Should work + ``` + +**Note**: API is accessible only on your home network (`helm:8675`). No public exposure by default. + +### Access from Other Machines + +Your agents running on other machines can access the API at `http://helm:8675`. + +```bash +# From any machine on your network +curl http://helm:8675/health +``` + +If DNS isn't set up, use `helm` directly (should resolve via local network or hosts file). + +## MCP Server for Claude Desktop / OpenCode + +The stack includes an MCP server that exposes your skills to Claude Desktop or OpenCode via the Model Context Protocol. + +### What's Running + +- **MCP Server**: SSE mode on `http://helm:3000` +- Automatically proxies requests to the Skills API (`http://api:8080` internally) +- Same Docker network, no extra configuration needed + +### Configure Claude Desktop + +Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows): + +```json +{ + "mcpServers": { + "skills": { + "url": "http://helm:3000" + } + } +} +``` + +Restart Claude. You should see `skills` server connected with tools like `search_skills`, `get_context`, etc. + +### Configure OpenCode + +See [OPENCODE-MCP.md](OPENCODE-MCP.md) for detailed instructions. In short: + +```bash +# Run the setup script from the agentic-templates repo: +cd ~/projects/agentic-templates +./setup-opencode-mcp.sh + +# Or manually create ~/.config/opencode/mcp.json: +{ + "mcpServers": { + "skills": { + "url": "http://helm:3000" + } + } +} +``` + +### Test MCP Connection + +```bash +# Should hang (SSE stream) if connected +curl http://helm:3000/sse + +# With API key if auth enabled: +curl -H "X-API-Key: your-key" http://helm:3000/sse +``` + +## Project Setup (Per Project/Session) + +For each new project or AI agent, you'll create an integration that uses the API. + +### Option A: Use the Template Repository (Recommended) + +We maintain a template repo for quick starts. + +#### 1. Clone the Template + +```bash +cd ~/projects # or wherever you keep projects +git clone git.bouncypixel.com:helm/agentic-templates.git my-agent +cd my-agent +``` + +Or clone directly via SSH: + +```bash +git clone ssh://git@helm:222/helm/agentic-templates.git my-agent +``` + +#### 2. Configure Environment + +Copy `.env.example` to `.env`: + +```bash +cp .env.example .env +``` + +Edit `.env` if needed: + +```env +API_URL=http://helm:8675 +API_KEY= # Only if auth enabled +PROJECT=/path/to/your/project # Optional, for context scoping +``` + +#### 3. Run Your Agent + +```bash +# Using Docker Compose (recommended) +docker compose up -d + +# Or run directly +pip install -r requirements.txt +python agent.py +``` + +The agent will automatically: +- Fetch relevant skills/conventions via RAG +- Store decisions in memory +- Compress conversation when it grows large + +### Option B: Manual Integration + +If you want to integrate into an existing project: + +1. Install the Python dependency: + +```bash +pip install httpx +``` + +2. Copy the integration pattern from `template/agent.py` (the `get_context`, `compress_messages`, `store_memory` functions). + +3. Add these calls to your agent's workflow: + + - Before each LLM call: `context = await get_context(query, project)` + - Inject context into system prompt + - After each response: `await store_memory(project, key, content)` + - When conversation > 10 messages: `compressed = await compress_messages(conversation)` + +See `USAGE.md` for detailed integration patterns. + +## Seeding Skills and Conventions + +The API comes with a seed script that adds useful skills. + +### Run the Seed Script + +```bash +cd /opt/ai-skills-api +python examples/seed-data.py +``` + +This adds: +- D&D campaign management skills +- Infrastructure/Docker skills +- Code review skills +- General best practices + +### Add Custom Skills + +#### Via API: + +```bash +curl -X POST http://helm:8675/skills \ + -H "Content-Type: application/json" \ + -d '{ + "id": "my-skill", + "name": "My Custom Skill", + "category": "custom", + "content": "Specific instructions for your agent...", + "tags": ["keyword1", "keyword2"] + }' +``` + +#### Via MCP (Claude Desktop): + +Use the `skills/create_skill` tool directly in Claude. + +#### Via Python: + +```python +import httpx + +resp = httpx.post( + "http://helm:8675/skills", + json={ + "id": "unique-skill-id", + "name": "Skill Name", + "category": "category", + "content": "Full skill instructions...", + "tags": ["tag1", "tag2"] + } +) +``` + +### Add Project Conventions + +Conventions are project-specific (tied to a project path or identifier): + +```bash +curl -X POST http://helm:8675/conventions \ + -H "Content-Type: application/json" \ + -d '{ + "name": "My Project Conventions", + "project": "/home/user/myproject", + "content": "Project-specific coding standards, workflows, etc." + }' +``` + +## Testing Your Setup + +### 1. Test RAG Context + +```bash +curl "http://helm:8675/context/rag?query=docker compose&project=test" +``` + +Should return JSON with `skills`, `conventions`, `snippets` arrays. + +### 2. Test Compression + +```bash +curl -X POST http://helm:8675/compress \ + -H "Content-Type: application/json" \ + -d '{ + "messages": [ + {"role": "user", "content": "Hello!"}, + {"role": "assistant", "content": "Hi there! How can I help?"}, + {"role": "user", "content": "Tell me about Docker."}, + {"role": "assistant", "content": "Docker is a containerization platform..."} + ] + }' +``` + +Should return compressed messages and `tokens_saved`. + +### 3. Test Memory + +```bash +curl -X POST http://helm:8675/memory \ + -H "Content-Type: application/json" \ + -d '{ + "project": "test", + "key": "decision-123", + "content": "We decided to use FastAPI for this project" + }' + +curl "http://helm:8675/memory?project=test" +``` + +### 4. Test from Agent Template + +```bash +cd ~/projects/my-agent +docker compose up -d +docker compose logs -f agent # Watch the agent start and interact +``` + +## Troubleshooting + +### Service Won't Start + +```bash +# Check logs +docker compose logs ai-skills-api + +# Common issues: +# - Port 8675 already in use: change port in docker-compose.yml +# - Permissions: ensure /opt/ai-skills-api is readable +``` + +### Ollama Not Pulling Model + +The entrypoint script auto-pulls `phi3:mini` if compression strategy is `ollama`. To force: + +```bash +docker compose exec ai-skills-api ollama pull phi3:mini +``` + +### Can't Connect from Other Machines + +- Ensure `helm` is reachable on the network (ping `helm`) +- Check Docker network: `docker network ls` (should have `ai-skills-api_default`) +- API is bound to `0.0.0.0:8675` inside container - accessible from host and other containers + +### Auth Errors + +- If you get 401, either disable auth in `config.yaml` or set `API_KEY` in your agent's `.env` +- Verify: `curl -H "X-API-Key: your-key" http://helm:8675/skills` + +### High RAG Latency (>10ms) + +- First request after startup will be slower (warming cache) +- Subsequent queries should be <5ms +- If still slow, check embedding model load: `docker compose logs ai-skills-api` + +## Next Steps + +- Read `USAGE.md` for detailed integration patterns and best practices +- Use the template repo for all new agent projects +- Add project-specific skills and conventions as you work +- Monitor logs for token savings diff --git a/USAGE.md b/USAGE.md new file mode 100644 index 0000000..e903312 --- /dev/null +++ b/USAGE.md @@ -0,0 +1,522 @@ +# Usage Guide: AI Skills API + +This guide explains how to use the AI Skills API effectively in your projects and AI agent sessions. + +## Table of Contents + +1. [Understanding the Integration Pattern](#understanding-the-integration-pattern) +2. [RAG Context Retrieval](#rag-context-retrieval) +3. [Conversation Compression](#conversation-compression) +4. [Project Memory](#project-memory) +5. [Session Workflow](#session-workflow) +6. [Managing Skills](#managing-skills) +7. [Token Accounting](#token-accounting) +8. [Best Practices](#best-practices) +9. [Example Implementations](#example-implementations) + +--- + +## Understanding the Integration Pattern + +The API provides three core capabilities that work together: + +1. **RAG (Retrieval-Augmented Generation)**: Before each LLM call, fetch relevant skills, conventions, and snippets based on your query. This injects relevant context without sending your entire knowledge base every time. + +2. **Compression**: When conversation history grows long (>10 turns), compress old messages into summaries to stay within context windows. + +3. **Memory**: Store decisions, configurations, and learnings per project for future reference. + +**Expected savings**: 60-80% token reduction vs. sending everything. + +--- + +## RAG Context Retrieval + +### The `/context/rag` Endpoint + +This is your primary integration point. It returns only the most relevant items from your knowledge base. + +**Request:** + +``` +GET /context/rag?query={query}&project={project} +``` + +**Response:** + +```json +{ + "skills": [ + { + "id": "homelab-docker-compose", + "name": "Docker Compose Standard", + "category": "homelab", + "content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.", + "relevance_score": 0.89 + } + ], + "conventions": [ + { + "id": "conv-123", + "name": "React Project Standards", + "project": "/home/user/my-react-app", + "content": "Use TypeScript, React 18+, and functional components with hooks.", + "relevance_score": 0.76 + } + ], + "snippets": [ + { + "id": "snippet-456", + "name": "FastAPI CORS setup", + "language": "python", + "content": "app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], ...)", + "relevance_score": 0.82 + } + ] +} +``` + +### How It Works + +- Skills are globally available (your general knowledge base) +- Conventions are scoped to a project path or identifier (e.g., `/home/user/project1`) +- Snippets are globally available code examples +- Relevance scores are cosine similarity (0-1) - items below 0.3 are typically filtered out +- Limits are configurable (default: 3 skills, 2 conventions, 2 snippets) + +### Usage Pattern + +```python +async def query_with_context(query: str, project: str = None): + # 1. Fetch context + context = await get_context(query, project) + + # 2. Build system prompt + system_prompt = format_context(context) + # system_prompt now contains: + # ## Relevant Skills + # ### Docker Compose Standard (relevance: 0.89) + # Always use docker-compose v3.8+... + # ... + + # 3. Inject into LLM call + messages = [ + {"role": "system", "content": system_prompt}, + {"role": "user", "content": query} + ] + response = await llm.chat(messages) + + return response +``` + +--- + +## Conversation Compression + +### The `/compress` Endpoint + +Compresses a list of conversation messages into a shorter representation. + +**Request:** + +```json +{ + "messages": [ + {"role": "user", "content": "Hello!"}, + {"role": "assistant", "content": "Hi! How can I help?"}, + {"role": "user", "content": "I need to set up Docker Compose."}, + {"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."}, + ... (up to 20+ messages) + ] +} +``` + +**Response:** + +```json +{ + "messages": [ + {"role": "system", "content": "Summary of earlier conversation..."}, + {"role": "user", "content": "I need to set up Docker Compose."}, + {"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."} + ], + "tokens_saved": 245 +} +``` + +### Compression Strategies + +- **Extractive** (default): Uses LSA summarization to select key sentences. Fast (~100-500ms), no model required. +- **Ollama**: Uses `phi3:mini` for abstractive summaries. Better quality but slower (~2s). Requires Ollama running. + +**Configure in `config.yaml`:** + +```yaml +compression: + enabled: true + strategy: "extractive" # or "ollama" +``` + +### Usage Pattern + +```python +conversation = [] + +async def chat(query): + # Add user message + conversation.append({"role": "user", "content": query}) + + # Call LLM (with context from RAG) + response = await llm.chat(conversation) + conversation.append({"role": "assistant", "content": response}) + + # Compress when conversation gets long + if len(conversation) >= 10: + compressed = await compress_messages(conversation) + conversation = compressed["messages"] + print(f"Saved {compressed['tokens_saved']} tokens") + + return response +``` + +**Important**: Keep the most recent ~4-6 turns uncompressed. The compression endpoint preserves recent messages and compresses only the older ones. + +--- + +## Project Memory + +### The `/memory` Endpoints + +Store and retrieve project-specific knowledge. + +**Store:** + +``` +POST /memory +{ + "project": "my-project", + "key": "architecture-decision-2024-01-15", + "content": "We chose FastAPI over Flask for async support and automatic OpenAPI docs." +} +``` + +**Retrieve:** + +``` +GET /memory?project=my-project +``` + +**Update:** + +``` +PUT /memory/{id} +``` + +**Delete:** + +``` +DELETE /memory/{id} +``` + +### Usage Pattern + +```python +# Store a decision after making it +await store_memory( + project="/home/user/myapp", + key="db-choice", + content="Using PostgreSQL over MongoDB for relational data integrity" +) + +# Retrieve past decisions at project start +resp = httpx.get("http://helm:8675/memory", params={"project": "/home/user/myapp"}) +decisions = resp.json()["entries"] +# decisions = [{"id": "...", "key": "db-choice", "content": "...", ...}] +``` + +**When to use memory:** +- Architecture decisions +- Configuration choices (API keys, service URLs) +- Learned preferences ("User likes code examples") +- Debugging notes ("Issue with CORS on port 8080") + +**When NOT to use memory:** +- Temporary conversation state (use compression instead) +- Large codebases (store in skills/snippets instead) +- Public documentation (should be in skills) + +--- + +## Session Workflow + +### Starting a New Session + +1. **Define your project identifier** - a path or unique string: + ```python + PROJECT = "/home/user/myapp" # or "my-discord-bot", "workspace-123" + ``` + +2. **Load past memories** (optional but helpful): + ```python + memories = httpx.get("http://helm:8675/memory", params={"project": PROJECT}).json()["entries"] + # Inject into system prompt or create context from them + ``` + +3. **Begin conversation loop** - for each user query: + - Call `GET /context/rag?query=...&project=PROJECT` + - Inject context into LLM prompt + - Call LLM + - Store important outputs in memory if they represent decisions/learnings + - Compress conversation when it reaches ~10 turns + +### Ending a Session + +- Optionally store a session summary in memory: + ```python + await store_memory(PROJECT, "session-summary-2024-01-15", "Completed user auth flow, decided on JWT tokens") + ``` + +- No cleanup needed - conversation state lives in your agent, not the server + +### Multi-Project Agents + +If your agent works across multiple projects: + +```python +# Switch project context mid-conversation +PROJECT = "/home/user/project1" # current active project + +# Each project has its own conventions and memories +context = await get_context(query, project=PROJECT) +``` + +--- + +## Managing Skills + +Skills are your reusable knowledge base. Manage them via API, MCP, or the seed script. + +### Categories + +Group skills by category (e.g., `homelab`, `dnd`, `python`, `devops`). Categories don't affect RAG retrieval but help with organization. + +### Tags + +Tags are keywords used for **future search** (not currently used by RAG, but planned for enhanced filtering). + +```json +{ + "tags": ["docker", "compose", "infrastructure", "production"] +} +``` + +### Best Practices for Skills + +- **Be specific**: "Docker Compose Production Patterns" > "Docker" +- **Include examples**: Show code snippets in the content +- **Keep it concise**: 1-3 paragraphs, focus on actionable guidance +- **Use markdown**: The API preserves formatting for injection into prompts +- **Version when updating**: If a skill changes significantly, create a new `id` (e.g., `docker-compose-v2`) + +### Search Skills + +``` +GET /skills/search?q={query} +``` + +Returns matching skills by name/content similarity. Useful for manual exploration but not needed in automated agents (use `/context/rag` instead). + +--- + +## Token Accounting + +### Count Tokens + +``` +GET /tokens/count?text={text} +``` + +Returns the token count (using tiktoken for GPT models, approximations for others). + +**Use this to:** +- Track compression savings +- Pre-flight check prompts before sending to LLM +- Budget token usage per session + +### Example: Measure RAG Savings + +```python +full_context = load_all_skills() # hypothetical: all your skills text +full_tokens = count_tokens(full_context) + +rag_context = get_context(query, project) # only relevant items +rag_tokens = count_tokens(format_context(rag_context)) + +savings_pct = (1 - rag_tokens / full_tokens) * 100 +print(f"RAG saved {savings_pct:.1f}% tokens") +``` + +--- + +## Best Practices + +### 1. Always Use Project Scoping + +Set `project` parameter consistently. Even if you have one main project, use a consistent identifier: + +```python +PROJECT = "/home/user/myapp" # NOT "default" or None +context = await get_context(query, project=PROJECT) +``` + +This allows: +- Project-specific conventions +- Memory isolation between projects +- Future per-project analytics + +### 2. Call RAG Before Every LLM Request + +Even if the query seems unrelated, the cost is negligible (<5ms, ~50 tokens). The knowledge injected often improves responses. + +### 3. Compress Proactively + +Don't wait until context window is full. Compress at ~10 messages: + +```python +if len(conversation) >= 10: + compressed = await compress_messages(conversation) + conversation = compressed["messages"] +``` + +This keeps the compression quality high (summaries are more accurate with fewer messages). + +### 4. Store Learnings, Not Everything + +Memory is for **decisions** and **facts you want to recall**. + +Don't store: +- Every user query/response (that's what compression is for) +- Public documentation (put in skills instead) +- Transient state (keep in agent memory) + +### 5. Version Your Skills + +When a skill's guidance changes: + +- **Minor update** (typo, clarification): update the existing skill's `content` in place +- **Major update** (different approach, breaking change): create a new `id` (e.g., `docker-compose-v2`) and optionally mark the old one as deprecated in its content + +### 6. Use MCP in Claude Desktop + +If you use Claude Desktop, add the MCP server (see `CLAUDE.md`). This gives you: +- Direct access to skills via Claude's tool calling +- No need to implement API calls manually +- Same token savings within Claude + +### 7. Monitor Token Savings + +Track metrics: + +```python +import time +from datetime import datetime + +logs = [] + +def log_savings(tokens_before, tokens_after, operation): + logs.append({ + "timestamp": datetime.now().isoformat(), + "operation": operation, + "tokens_before": tokens_before, + "tokens_after": tokens_after, + "savings": tokens_before - tokens_after + }) + # Periodically upload or analyze these +``` + +--- + +## Example Implementations + +### Minimal Agent + +```python +import asyncio, httpx, os + +API_URL = os.getenv("API_URL", "http://helm:8675") +PROJECT = os.getenv("PROJECT", "/default") + +async def get_context(query): + async with httpx.AsyncClient() as client: + resp = await client.get(f"{API_URL}/context/rag", params={"query": query, "project": PROJECT}) + return resp.json() + +async def chat(): + conv = [] + while True: + query = input("You: ") + if query == "quit": break + + # Get context + ctx = await get_context(query) + system = format_context(ctx) + + # Call LLM (pseudo) + response = call_llm(system, conv[-4:], query) + + conv.extend([{"role": "user", "content": query}, + {"role": "assistant", "content": response}]) + + print(f"Assistant: {response}") + +asyncio.run(chat()) +``` + +### Discord Bot with Context + +```python +import discord +from discord.ext import commands +import httpx + +bot = commands.Bot(command_prefix="!") +API_URL = "http://helm:8675" +PROJECT = "/home/user/discord-bot" + +@bot.event +async def on_message(message): + if message.author == bot.user: + return + + # RAG context + async with httpx.AsyncClient() as client: + resp = await client.get(f"{API_URL}/context/rag", params={"query": message.content, "project": PROJECT}) + ctx = resp.json() + + # Build prompt + system_prompt = format_context(ctx) + "\n\nYou are a helpful Discord bot." + + # Respond (using your LLM of choice) + response = await generate_response(message.content, system_prompt) + await message.reply(response) + + # Store in memory if it's a decision + if "decision" in message.content.lower(): + async with httpx.AsyncClient() as client: + await client.post(f"{API_URL}/memory", json={ + "project": PROJECT, + "key": f"decision-{discord.utils.utcnow().timestamp()}", + "content": response[:500] + }) + +bot.run(os.getenv("DISCORD_TOKEN")) +``` + +--- + +## Need More Help? + +- **Setup issues**: See `SETUP.md` +- **Template repo**: Clone `git.bouncypixel.com:helm/agentic-templates.git` +- **API reference**: Visit `http://helm:8675/docs` when the service is running +- **MCP tools**: See `CLAUDE.md` for Claude Desktop integration diff --git a/docker-compose.yml b/docker-compose.yml index 8fa717a..ec46a23 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -28,3 +28,18 @@ services: interval: 30s timeout: 10s retries: 3 + + mcp: + build: + context: . + dockerfile: mcp/Dockerfile + command: python skills.py + ports: + - "3000:3000" + environment: + - SKILLS_API_URL=http://api:8080 + - MCP_TRANSPORT=sse + - MCP_PORT=3000 + depends_on: + - api + restart: unless-stopped diff --git a/mcp/requirements.txt b/mcp/requirements.txt index 7fe4e3d..dd3d077 100644 --- a/mcp/requirements.txt +++ b/mcp/requirements.txt @@ -3,3 +3,4 @@ httpx==0.26.0 python-dotenv==1.0.0 docker==7.0.0 psutil==5.9.7 +uvicorn[standard]==0.27.0 diff --git a/mcp/skills.py b/mcp/skills.py index 55b83f6..26f81ed 100644 --- a/mcp/skills.py +++ b/mcp/skills.py @@ -1,6 +1,7 @@ from mcp.server.fastmcp import FastMCP import httpx import os +import uvicorn mcp = FastMCP("skills") @@ -162,4 +163,11 @@ def create_skill( if __name__ == "__main__": - mcp.run() + transport = os.getenv("MCP_TRANSPORT", "stdio") + + if transport == "sse": + host = os.getenv("MCP_HOST", "0.0.0.0") + port = int(os.getenv("MCP_PORT", "3000")) + mcp.run_sse(host=host, port=port) + else: + mcp.run()