Add SSE MCP server, comprehensive docs, and OpenCode integration

- Implement SSE mode for MCP server (mcp/skills.py)
- Add MCP service to docker-compose.yml on port 3000
- Add uvicorn dependency to mcp/requirements.txt
- Create SETUP.md, USAGE.md, OPENCODE-MCP.md
- Update README with quick links and MCP section
- Remove semantic cache references throughout
- Add cross-platform Python MCP setup script to template repo
This commit is contained in:
Lukas Parsons 2026-03-22 23:59:33 -04:00
parent 95805dfc86
commit e346d356e5
7 changed files with 1085 additions and 83 deletions

118
OPENCODE-MCP.md Normal file
View file

@ -0,0 +1,118 @@
# OpenCode MCP Configuration
OpenCode (open-source alternative to Cursor/Claude) supports MCP servers. This guide shows how to connect it to your AI Skills API MCP server running on `helm`.
## Prerequisites
- AI Skills API stack running on `helm` (includes MCP server on port 3000)
- OpenCode installed on your local machine
## MCP Server Endpoint
Your MCP server is accessible at:
```
http://helm:3000
```
It exposes two endpoints:
- `GET /sse` - Server-Sent Events (for client connection)
- `POST /messages` - JSON-RPC messages
## OpenCode Configuration
OpenCode reads MCP server config from its settings. You need to add an MCP server with the SSE URL.
### Configuration JSON
Add this to your OpenCode MCP configuration (location varies by install):
```json
{
"mcpServers": {
"skills": {
"url": "http://helm:3000"
}
}
}
```
**Note**: Use `"url"` not `"command"` since the server is remote and uses SSE transport.
### Where to Put This
OpenCode typically reads MCP config from:
- `~/.config/opencode/mcp.json`
- or in the app settings UI (Preferences → MCP → Add Server → Manual)
If using a file, create/edit `~/.config/opencode/mcp.json`:
```bash
mkdir -p ~/.config/opencode
cat > ~/.config/opencode/mcp.json << 'EOF'
{
"mcpServers": {
"skills": {
"url": "http://helm:3000"
}
}
}
EOF
```
### Test Connection
1. Restart OpenCode (if running)
2. Open the MCP servers panel/tool
3. You should see "skills" server listed as connected
4. Available tools will include:
- `search_skills`
- `get_skill`
- `list_skills`
- `get_context`
- `get_conventions`
- `get_snippets`
- `get_memory`
- `add_memory`
- `create_skill`
## Troubleshooting
### "Cannot connect to MCP server"
- Ensure the stack is up: `docker compose -f /path/to/ai-skills-api/docker-compose.yml ps`
- Check MCP service logs: `docker compose logs mcp`
- Verify `helm` resolves: `ping helm` or use IP address instead
- If using IP, change config to `"url": "http://192.168.x.x:3000"`
### "Connection refused" or timeout
- Ensure port 3000 is exposed: `netstat -tuln | grep 3000` on helm
- Check firewall: helm should accept connections on 3000 from your network
### Tools not appearing
- Wait 10-20 seconds after OpenCode starts for MCP connection to establish
- Check OpenCode logs for MCP connection errors
- Verify the skills service is healthy: `docker compose ps` (mcp should be "Up" and healthy)
## Using the Tools
Once connected, you can invoke MCP tools from OpenCode:
- `get_context(project="/home/user/myapp")` → fetches relevant skills/conventions
- `search_skills(query="docker compose")` → finds matching skills
- `create_skill(...)` → adds new skill to the database
- `add_memory(project, key, content)` → stores learnings
These calls happen over the network to `helm:3000` and the MCP server forwards requests to the Skills API (`helm:8675` internally).
## Security Note
The MCP server is exposed on your home network without authentication (relies on network trust). If you need auth, we can add a reverse proxy or API key layer.
## One-Line Setup Script
If you're setting up on a new machine, run this from the `agentic-templates` repo:
```bash
./setup-opencode-mcp.sh
```
It will detect your OpenCode config location and add the MCP server automatically.

108
README.md
View file

@ -2,22 +2,14 @@
Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.
## Quick Start
**API available at**: `http://helm:8675`
**Interactive docs**: `http://helm:8675/docs`
```bash
# Copy config file (optional, uses defaults if missing)
cp config.yaml.example config.yaml # customize if needed
## Quick Links
# Run with Docker
docker compose up -d
# Or run locally
pip install -r requirements.txt
uvicorn main:app --reload
```
API available at `http://helm:8675`
Docs at `http://helm:8675/docs`
- **[Setup Guide](SETUP.md)** - One-time deployment on your server
- **[Usage Guide](USAGE.md)** - How to integrate with your agents
- **[Template Repository](https://git.bouncypixel.com/helm/agentic-templates)** - Starter kit for new projects
## Key Features
@ -27,24 +19,22 @@ Docs at `http://helm:8675/docs`
- **Simple API**: RESTful JSON API + MCP server for Claude Desktop
- **Zero-friction auth**: Optional API key (set-and-forget)
## Configuration
## Quick Start (5 minutes)
Create `config.yaml` (optional) to customize:
```bash
# 1. Deploy the service on helm (see SETUP.md for details)
docker compose up -d
```yaml
port: 8675
rag:
max_skills: 3
max_conventions: 2
max_snippets: 2
compression:
enabled: true
strategy: "extractive" # or "ollama" for phi-3-mini
auth:
enabled: false # set to true and change api_key
# 2. Clone the template repo for your agent project
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
cp .env.example .env
docker compose up -d
# 3. Your agent is now running with context management
```
Or use environment variables (see `config.py` for full list).
See **[SETUP.md](SETUP.md)** for complete deployment instructions and **[USAGE.md](USAGE.md)** for integration patterns.
## Endpoints
@ -113,19 +103,21 @@ async def query_llm(prompt, conversation_history, project=None):
**Expected savings**: 60-80% token reduction vs. sending everything.
See **[USAGE.md](USAGE.md)** for complete integration patterns, examples, and best practices.
## Template Repository
Want to get started quickly? Use the agent template:
```bash
# Clone the template (on your Forgejo)
git clone git.bouncypixel.com:helm/ai-agent-template.git
cd ai-agent-template
# Clone the template
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
cp .env.example .env
docker compose up -d
```
The template includes a working agent integration and docker-compose setup.
The template includes a working agent integration and docker-compose setup. See [USAGE.md](USAGE.md) for integration patterns.
## How It Works (Architecture)
@ -166,7 +158,7 @@ If you use Claude Desktop, add to your config:
Available tools:
- `search_skills`, `get_skill`, `list_skills`
- `get_context`, `get_conventions`, `get_snippets`
- `check_cache` (deprecated), `get_memory`, `add_memory`, `create_skill`
- `get_memory`, `add_memory`, `create_skill`
## Migration from v1
@ -186,52 +178,4 @@ If you were using the old semantic cache:
MIT
## Example Usage
### Create a skill
```bash
curl -X POST http://helm:8675/skills \
-H "Content-Type: application/json" \
-d '{
"id": "homelab-docker-compose",
"name": "Docker Compose Standard",
"category": "homelab",
"content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
"tags": ["docker", "compose", "infrastructure"]
}'
```
### Get context bundle
```bash
curl "http://helm:8675/context?project=/home/server/apps/media-server&skills=homelab-docker-compose,react-v2"
```
### Check cache
```bash
curl -X POST http://helm:8675/cache/lookup \
-H "Content-Type: application/json" \
-d '{
"prompt": "How do I configure traefik?",
"model": "claude-3-opus"
}'
```
## Integration Pattern
In your agent's system prompt or pre-request hook:
1. Call `GET /context?project={current_project}&skills={skill_ids}`
2. Inject returned content into the prompt
3. Before sending to LLM, check `POST /cache/lookup`
4. After receiving response, optionally `POST /cache/store`
This avoids re-sending your standards every request and caches repeated queries.
## Database
SQLite database `ai.db` with tables:
- `skills` - Reusable patterns and instructions
- `snippets` - Code snippets
- `conventions` - Project-specific conventions
- `cache` - LRU cache of LLM responses
- `memory` - Project memory/notes
For detailed usage examples and API reference, see [USAGE.md](USAGE.md) and the interactive docs at `http://helm:8675/docs` when the service is running.

394
SETUP.md Normal file
View file

@ -0,0 +1,394 @@
# Setup Guide: AI Skills API
This guide covers exactly how to deploy the AI Skills API on your home server (`helm`) and set up new agent projects.
## Prerequisites
- Docker & Docker Compose installed on `helm`
- Access to `helm` from your development machine (SSH or local)
- Optional: Claude Desktop with MCP support
## Server Setup (One-Time)
Deploy the AI Skills API service on your home server.
### 1. Clone the Repository
```bash
# On helm (or accessible to docker)
cd /opt
git clone ssh://git@helm:222/helm/ai-skills-api.git
cd ai-skills-api
```
### 2. Build and Start Services
```bash
# Build and start all services (API + Ollama + MCP)
docker compose up -d --build
# Check it's running
docker compose ps
# Should show: api, ollama, mcp (all "Up")
```
### 3. Verify Deployment
```bash
# Health check (from helm)
curl http://localhost:8675/health
# Expected response: {"status":"healthy"}
```
### 4. Configure Optional Settings
Edit `config.yaml` (creates defaults if missing):
```yaml
port: 8675
rag:
max_skills: 3
max_conventions: 2
max_snippets: 2
compression:
enabled: true
strategy: "extractive" # or "ollama" for phi-3-mini
auth:
enabled: false # set to true to require API key
api_key: "your-secret-key-here"
```
Restart after changes:
```bash
docker compose restart
```
### 5. (Optional) Enable API Authentication
If you want auth across your network:
1. Edit `config.yaml`:
```yaml
auth:
enabled: true
api_key: "generate-a-strong-random-key"
```
2. Restart:
```bash
docker compose restart
```
3. Test:
```bash
curl http://helm:8675/health # Should work (no auth)
curl http://helm:8675/skills # Should fail 401 if auth enabled
curl -H "X-API-Key: your-secret-key-here" http://helm:8675/skills # Should work
```
**Note**: API is accessible only on your home network (`helm:8675`). No public exposure by default.
### Access from Other Machines
Your agents running on other machines can access the API at `http://helm:8675`.
```bash
# From any machine on your network
curl http://helm:8675/health
```
If DNS isn't set up, use `helm` directly (should resolve via local network or hosts file).
## MCP Server for Claude Desktop / OpenCode
The stack includes an MCP server that exposes your skills to Claude Desktop or OpenCode via the Model Context Protocol.
### What's Running
- **MCP Server**: SSE mode on `http://helm:3000`
- Automatically proxies requests to the Skills API (`http://api:8080` internally)
- Same Docker network, no extra configuration needed
### Configure Claude Desktop
Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
```json
{
"mcpServers": {
"skills": {
"url": "http://helm:3000"
}
}
}
```
Restart Claude. You should see `skills` server connected with tools like `search_skills`, `get_context`, etc.
### Configure OpenCode
See [OPENCODE-MCP.md](OPENCODE-MCP.md) for detailed instructions. In short:
```bash
# Run the setup script from the agentic-templates repo:
cd ~/projects/agentic-templates
./setup-opencode-mcp.sh
# Or manually create ~/.config/opencode/mcp.json:
{
"mcpServers": {
"skills": {
"url": "http://helm:3000"
}
}
}
```
### Test MCP Connection
```bash
# Should hang (SSE stream) if connected
curl http://helm:3000/sse
# With API key if auth enabled:
curl -H "X-API-Key: your-key" http://helm:3000/sse
```
## Project Setup (Per Project/Session)
For each new project or AI agent, you'll create an integration that uses the API.
### Option A: Use the Template Repository (Recommended)
We maintain a template repo for quick starts.
#### 1. Clone the Template
```bash
cd ~/projects # or wherever you keep projects
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
```
Or clone directly via SSH:
```bash
git clone ssh://git@helm:222/helm/agentic-templates.git my-agent
```
#### 2. Configure Environment
Copy `.env.example` to `.env`:
```bash
cp .env.example .env
```
Edit `.env` if needed:
```env
API_URL=http://helm:8675
API_KEY= # Only if auth enabled
PROJECT=/path/to/your/project # Optional, for context scoping
```
#### 3. Run Your Agent
```bash
# Using Docker Compose (recommended)
docker compose up -d
# Or run directly
pip install -r requirements.txt
python agent.py
```
The agent will automatically:
- Fetch relevant skills/conventions via RAG
- Store decisions in memory
- Compress conversation when it grows large
### Option B: Manual Integration
If you want to integrate into an existing project:
1. Install the Python dependency:
```bash
pip install httpx
```
2. Copy the integration pattern from `template/agent.py` (the `get_context`, `compress_messages`, `store_memory` functions).
3. Add these calls to your agent's workflow:
- Before each LLM call: `context = await get_context(query, project)`
- Inject context into system prompt
- After each response: `await store_memory(project, key, content)`
- When conversation > 10 messages: `compressed = await compress_messages(conversation)`
See `USAGE.md` for detailed integration patterns.
## Seeding Skills and Conventions
The API comes with a seed script that adds useful skills.
### Run the Seed Script
```bash
cd /opt/ai-skills-api
python examples/seed-data.py
```
This adds:
- D&D campaign management skills
- Infrastructure/Docker skills
- Code review skills
- General best practices
### Add Custom Skills
#### Via API:
```bash
curl -X POST http://helm:8675/skills \
-H "Content-Type: application/json" \
-d '{
"id": "my-skill",
"name": "My Custom Skill",
"category": "custom",
"content": "Specific instructions for your agent...",
"tags": ["keyword1", "keyword2"]
}'
```
#### Via MCP (Claude Desktop):
Use the `skills/create_skill` tool directly in Claude.
#### Via Python:
```python
import httpx
resp = httpx.post(
"http://helm:8675/skills",
json={
"id": "unique-skill-id",
"name": "Skill Name",
"category": "category",
"content": "Full skill instructions...",
"tags": ["tag1", "tag2"]
}
)
```
### Add Project Conventions
Conventions are project-specific (tied to a project path or identifier):
```bash
curl -X POST http://helm:8675/conventions \
-H "Content-Type: application/json" \
-d '{
"name": "My Project Conventions",
"project": "/home/user/myproject",
"content": "Project-specific coding standards, workflows, etc."
}'
```
## Testing Your Setup
### 1. Test RAG Context
```bash
curl "http://helm:8675/context/rag?query=docker compose&project=test"
```
Should return JSON with `skills`, `conventions`, `snippets` arrays.
### 2. Test Compression
```bash
curl -X POST http://helm:8675/compress \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there! How can I help?"},
{"role": "user", "content": "Tell me about Docker."},
{"role": "assistant", "content": "Docker is a containerization platform..."}
]
}'
```
Should return compressed messages and `tokens_saved`.
### 3. Test Memory
```bash
curl -X POST http://helm:8675/memory \
-H "Content-Type: application/json" \
-d '{
"project": "test",
"key": "decision-123",
"content": "We decided to use FastAPI for this project"
}'
curl "http://helm:8675/memory?project=test"
```
### 4. Test from Agent Template
```bash
cd ~/projects/my-agent
docker compose up -d
docker compose logs -f agent # Watch the agent start and interact
```
## Troubleshooting
### Service Won't Start
```bash
# Check logs
docker compose logs ai-skills-api
# Common issues:
# - Port 8675 already in use: change port in docker-compose.yml
# - Permissions: ensure /opt/ai-skills-api is readable
```
### Ollama Not Pulling Model
The entrypoint script auto-pulls `phi3:mini` if compression strategy is `ollama`. To force:
```bash
docker compose exec ai-skills-api ollama pull phi3:mini
```
### Can't Connect from Other Machines
- Ensure `helm` is reachable on the network (ping `helm`)
- Check Docker network: `docker network ls` (should have `ai-skills-api_default`)
- API is bound to `0.0.0.0:8675` inside container - accessible from host and other containers
### Auth Errors
- If you get 401, either disable auth in `config.yaml` or set `API_KEY` in your agent's `.env`
- Verify: `curl -H "X-API-Key: your-key" http://helm:8675/skills`
### High RAG Latency (>10ms)
- First request after startup will be slower (warming cache)
- Subsequent queries should be <5ms
- If still slow, check embedding model load: `docker compose logs ai-skills-api`
## Next Steps
- Read `USAGE.md` for detailed integration patterns and best practices
- Use the template repo for all new agent projects
- Add project-specific skills and conventions as you work
- Monitor logs for token savings

522
USAGE.md Normal file
View file

@ -0,0 +1,522 @@
# Usage Guide: AI Skills API
This guide explains how to use the AI Skills API effectively in your projects and AI agent sessions.
## Table of Contents
1. [Understanding the Integration Pattern](#understanding-the-integration-pattern)
2. [RAG Context Retrieval](#rag-context-retrieval)
3. [Conversation Compression](#conversation-compression)
4. [Project Memory](#project-memory)
5. [Session Workflow](#session-workflow)
6. [Managing Skills](#managing-skills)
7. [Token Accounting](#token-accounting)
8. [Best Practices](#best-practices)
9. [Example Implementations](#example-implementations)
---
## Understanding the Integration Pattern
The API provides three core capabilities that work together:
1. **RAG (Retrieval-Augmented Generation)**: Before each LLM call, fetch relevant skills, conventions, and snippets based on your query. This injects relevant context without sending your entire knowledge base every time.
2. **Compression**: When conversation history grows long (>10 turns), compress old messages into summaries to stay within context windows.
3. **Memory**: Store decisions, configurations, and learnings per project for future reference.
**Expected savings**: 60-80% token reduction vs. sending everything.
---
## RAG Context Retrieval
### The `/context/rag` Endpoint
This is your primary integration point. It returns only the most relevant items from your knowledge base.
**Request:**
```
GET /context/rag?query={query}&project={project}
```
**Response:**
```json
{
"skills": [
{
"id": "homelab-docker-compose",
"name": "Docker Compose Standard",
"category": "homelab",
"content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
"relevance_score": 0.89
}
],
"conventions": [
{
"id": "conv-123",
"name": "React Project Standards",
"project": "/home/user/my-react-app",
"content": "Use TypeScript, React 18+, and functional components with hooks.",
"relevance_score": 0.76
}
],
"snippets": [
{
"id": "snippet-456",
"name": "FastAPI CORS setup",
"language": "python",
"content": "app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], ...)",
"relevance_score": 0.82
}
]
}
```
### How It Works
- Skills are globally available (your general knowledge base)
- Conventions are scoped to a project path or identifier (e.g., `/home/user/project1`)
- Snippets are globally available code examples
- Relevance scores are cosine similarity (0-1) - items below 0.3 are typically filtered out
- Limits are configurable (default: 3 skills, 2 conventions, 2 snippets)
### Usage Pattern
```python
async def query_with_context(query: str, project: str = None):
# 1. Fetch context
context = await get_context(query, project)
# 2. Build system prompt
system_prompt = format_context(context)
# system_prompt now contains:
# ## Relevant Skills
# ### Docker Compose Standard (relevance: 0.89)
# Always use docker-compose v3.8+...
# ...
# 3. Inject into LLM call
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
]
response = await llm.chat(messages)
return response
```
---
## Conversation Compression
### The `/compress` Endpoint
Compresses a list of conversation messages into a shorter representation.
**Request:**
```json
{
"messages": [
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi! How can I help?"},
{"role": "user", "content": "I need to set up Docker Compose."},
{"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."},
... (up to 20+ messages)
]
}
```
**Response:**
```json
{
"messages": [
{"role": "system", "content": "Summary of earlier conversation..."},
{"role": "user", "content": "I need to set up Docker Compose."},
{"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."}
],
"tokens_saved": 245
}
```
### Compression Strategies
- **Extractive** (default): Uses LSA summarization to select key sentences. Fast (~100-500ms), no model required.
- **Ollama**: Uses `phi3:mini` for abstractive summaries. Better quality but slower (~2s). Requires Ollama running.
**Configure in `config.yaml`:**
```yaml
compression:
enabled: true
strategy: "extractive" # or "ollama"
```
### Usage Pattern
```python
conversation = []
async def chat(query):
# Add user message
conversation.append({"role": "user", "content": query})
# Call LLM (with context from RAG)
response = await llm.chat(conversation)
conversation.append({"role": "assistant", "content": response})
# Compress when conversation gets long
if len(conversation) >= 10:
compressed = await compress_messages(conversation)
conversation = compressed["messages"]
print(f"Saved {compressed['tokens_saved']} tokens")
return response
```
**Important**: Keep the most recent ~4-6 turns uncompressed. The compression endpoint preserves recent messages and compresses only the older ones.
---
## Project Memory
### The `/memory` Endpoints
Store and retrieve project-specific knowledge.
**Store:**
```
POST /memory
{
"project": "my-project",
"key": "architecture-decision-2024-01-15",
"content": "We chose FastAPI over Flask for async support and automatic OpenAPI docs."
}
```
**Retrieve:**
```
GET /memory?project=my-project
```
**Update:**
```
PUT /memory/{id}
```
**Delete:**
```
DELETE /memory/{id}
```
### Usage Pattern
```python
# Store a decision after making it
await store_memory(
project="/home/user/myapp",
key="db-choice",
content="Using PostgreSQL over MongoDB for relational data integrity"
)
# Retrieve past decisions at project start
resp = httpx.get("http://helm:8675/memory", params={"project": "/home/user/myapp"})
decisions = resp.json()["entries"]
# decisions = [{"id": "...", "key": "db-choice", "content": "...", ...}]
```
**When to use memory:**
- Architecture decisions
- Configuration choices (API keys, service URLs)
- Learned preferences ("User likes code examples")
- Debugging notes ("Issue with CORS on port 8080")
**When NOT to use memory:**
- Temporary conversation state (use compression instead)
- Large codebases (store in skills/snippets instead)
- Public documentation (should be in skills)
---
## Session Workflow
### Starting a New Session
1. **Define your project identifier** - a path or unique string:
```python
PROJECT = "/home/user/myapp" # or "my-discord-bot", "workspace-123"
```
2. **Load past memories** (optional but helpful):
```python
memories = httpx.get("http://helm:8675/memory", params={"project": PROJECT}).json()["entries"]
# Inject into system prompt or create context from them
```
3. **Begin conversation loop** - for each user query:
- Call `GET /context/rag?query=...&project=PROJECT`
- Inject context into LLM prompt
- Call LLM
- Store important outputs in memory if they represent decisions/learnings
- Compress conversation when it reaches ~10 turns
### Ending a Session
- Optionally store a session summary in memory:
```python
await store_memory(PROJECT, "session-summary-2024-01-15", "Completed user auth flow, decided on JWT tokens")
```
- No cleanup needed - conversation state lives in your agent, not the server
### Multi-Project Agents
If your agent works across multiple projects:
```python
# Switch project context mid-conversation
PROJECT = "/home/user/project1" # current active project
# Each project has its own conventions and memories
context = await get_context(query, project=PROJECT)
```
---
## Managing Skills
Skills are your reusable knowledge base. Manage them via API, MCP, or the seed script.
### Categories
Group skills by category (e.g., `homelab`, `dnd`, `python`, `devops`). Categories don't affect RAG retrieval but help with organization.
### Tags
Tags are keywords used for **future search** (not currently used by RAG, but planned for enhanced filtering).
```json
{
"tags": ["docker", "compose", "infrastructure", "production"]
}
```
### Best Practices for Skills
- **Be specific**: "Docker Compose Production Patterns" > "Docker"
- **Include examples**: Show code snippets in the content
- **Keep it concise**: 1-3 paragraphs, focus on actionable guidance
- **Use markdown**: The API preserves formatting for injection into prompts
- **Version when updating**: If a skill changes significantly, create a new `id` (e.g., `docker-compose-v2`)
### Search Skills
```
GET /skills/search?q={query}
```
Returns matching skills by name/content similarity. Useful for manual exploration but not needed in automated agents (use `/context/rag` instead).
---
## Token Accounting
### Count Tokens
```
GET /tokens/count?text={text}
```
Returns the token count (using tiktoken for GPT models, approximations for others).
**Use this to:**
- Track compression savings
- Pre-flight check prompts before sending to LLM
- Budget token usage per session
### Example: Measure RAG Savings
```python
full_context = load_all_skills() # hypothetical: all your skills text
full_tokens = count_tokens(full_context)
rag_context = get_context(query, project) # only relevant items
rag_tokens = count_tokens(format_context(rag_context))
savings_pct = (1 - rag_tokens / full_tokens) * 100
print(f"RAG saved {savings_pct:.1f}% tokens")
```
---
## Best Practices
### 1. Always Use Project Scoping
Set `project` parameter consistently. Even if you have one main project, use a consistent identifier:
```python
PROJECT = "/home/user/myapp" # NOT "default" or None
context = await get_context(query, project=PROJECT)
```
This allows:
- Project-specific conventions
- Memory isolation between projects
- Future per-project analytics
### 2. Call RAG Before Every LLM Request
Even if the query seems unrelated, the cost is negligible (<5ms, ~50 tokens). The knowledge injected often improves responses.
### 3. Compress Proactively
Don't wait until context window is full. Compress at ~10 messages:
```python
if len(conversation) >= 10:
compressed = await compress_messages(conversation)
conversation = compressed["messages"]
```
This keeps the compression quality high (summaries are more accurate with fewer messages).
### 4. Store Learnings, Not Everything
Memory is for **decisions** and **facts you want to recall**.
Don't store:
- Every user query/response (that's what compression is for)
- Public documentation (put in skills instead)
- Transient state (keep in agent memory)
### 5. Version Your Skills
When a skill's guidance changes:
- **Minor update** (typo, clarification): update the existing skill's `content` in place
- **Major update** (different approach, breaking change): create a new `id` (e.g., `docker-compose-v2`) and optionally mark the old one as deprecated in its content
### 6. Use MCP in Claude Desktop
If you use Claude Desktop, add the MCP server (see `CLAUDE.md`). This gives you:
- Direct access to skills via Claude's tool calling
- No need to implement API calls manually
- Same token savings within Claude
### 7. Monitor Token Savings
Track metrics:
```python
import time
from datetime import datetime
logs = []
def log_savings(tokens_before, tokens_after, operation):
logs.append({
"timestamp": datetime.now().isoformat(),
"operation": operation,
"tokens_before": tokens_before,
"tokens_after": tokens_after,
"savings": tokens_before - tokens_after
})
# Periodically upload or analyze these
```
---
## Example Implementations
### Minimal Agent
```python
import asyncio, httpx, os
API_URL = os.getenv("API_URL", "http://helm:8675")
PROJECT = os.getenv("PROJECT", "/default")
async def get_context(query):
async with httpx.AsyncClient() as client:
resp = await client.get(f"{API_URL}/context/rag", params={"query": query, "project": PROJECT})
return resp.json()
async def chat():
conv = []
while True:
query = input("You: ")
if query == "quit": break
# Get context
ctx = await get_context(query)
system = format_context(ctx)
# Call LLM (pseudo)
response = call_llm(system, conv[-4:], query)
conv.extend([{"role": "user", "content": query},
{"role": "assistant", "content": response}])
print(f"Assistant: {response}")
asyncio.run(chat())
```
### Discord Bot with Context
```python
import discord
from discord.ext import commands
import httpx
bot = commands.Bot(command_prefix="!")
API_URL = "http://helm:8675"
PROJECT = "/home/user/discord-bot"
@bot.event
async def on_message(message):
if message.author == bot.user:
return
# RAG context
async with httpx.AsyncClient() as client:
resp = await client.get(f"{API_URL}/context/rag", params={"query": message.content, "project": PROJECT})
ctx = resp.json()
# Build prompt
system_prompt = format_context(ctx) + "\n\nYou are a helpful Discord bot."
# Respond (using your LLM of choice)
response = await generate_response(message.content, system_prompt)
await message.reply(response)
# Store in memory if it's a decision
if "decision" in message.content.lower():
async with httpx.AsyncClient() as client:
await client.post(f"{API_URL}/memory", json={
"project": PROJECT,
"key": f"decision-{discord.utils.utcnow().timestamp()}",
"content": response[:500]
})
bot.run(os.getenv("DISCORD_TOKEN"))
```
---
## Need More Help?
- **Setup issues**: See `SETUP.md`
- **Template repo**: Clone `git.bouncypixel.com:helm/agentic-templates.git`
- **API reference**: Visit `http://helm:8675/docs` when the service is running
- **MCP tools**: See `CLAUDE.md` for Claude Desktop integration

View file

@ -28,3 +28,18 @@ services:
interval: 30s
timeout: 10s
retries: 3
mcp:
build:
context: .
dockerfile: mcp/Dockerfile
command: python skills.py
ports:
- "3000:3000"
environment:
- SKILLS_API_URL=http://api:8080
- MCP_TRANSPORT=sse
- MCP_PORT=3000
depends_on:
- api
restart: unless-stopped

View file

@ -3,3 +3,4 @@ httpx==0.26.0
python-dotenv==1.0.0
docker==7.0.0
psutil==5.9.7
uvicorn[standard]==0.27.0

View file

@ -1,6 +1,7 @@
from mcp.server.fastmcp import FastMCP
import httpx
import os
import uvicorn
mcp = FastMCP("skills")
@ -162,4 +163,11 @@ def create_skill(
if __name__ == "__main__":
mcp.run()
transport = os.getenv("MCP_TRANSPORT", "stdio")
if transport == "sse":
host = os.getenv("MCP_HOST", "0.0.0.0")
port = int(os.getenv("MCP_PORT", "3000"))
mcp.run_sse(host=host, port=port)
else:
mcp.run()