Add SSE MCP server, comprehensive docs, and OpenCode integration
- Implement SSE mode for MCP server (mcp/skills.py) - Add MCP service to docker-compose.yml on port 3000 - Add uvicorn dependency to mcp/requirements.txt - Create SETUP.md, USAGE.md, OPENCODE-MCP.md - Update README with quick links and MCP section - Remove semantic cache references throughout - Add cross-platform Python MCP setup script to template repo
This commit is contained in:
parent
95805dfc86
commit
e346d356e5
7 changed files with 1085 additions and 83 deletions
118
OPENCODE-MCP.md
Normal file
118
OPENCODE-MCP.md
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
# OpenCode MCP Configuration
|
||||
|
||||
OpenCode (open-source alternative to Cursor/Claude) supports MCP servers. This guide shows how to connect it to your AI Skills API MCP server running on `helm`.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- AI Skills API stack running on `helm` (includes MCP server on port 3000)
|
||||
- OpenCode installed on your local machine
|
||||
|
||||
## MCP Server Endpoint
|
||||
|
||||
Your MCP server is accessible at:
|
||||
```
|
||||
http://helm:3000
|
||||
```
|
||||
|
||||
It exposes two endpoints:
|
||||
- `GET /sse` - Server-Sent Events (for client connection)
|
||||
- `POST /messages` - JSON-RPC messages
|
||||
|
||||
## OpenCode Configuration
|
||||
|
||||
OpenCode reads MCP server config from its settings. You need to add an MCP server with the SSE URL.
|
||||
|
||||
### Configuration JSON
|
||||
|
||||
Add this to your OpenCode MCP configuration (location varies by install):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"skills": {
|
||||
"url": "http://helm:3000"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Note**: Use `"url"` not `"command"` since the server is remote and uses SSE transport.
|
||||
|
||||
### Where to Put This
|
||||
|
||||
OpenCode typically reads MCP config from:
|
||||
- `~/.config/opencode/mcp.json`
|
||||
- or in the app settings UI (Preferences → MCP → Add Server → Manual)
|
||||
|
||||
If using a file, create/edit `~/.config/opencode/mcp.json`:
|
||||
|
||||
```bash
|
||||
mkdir -p ~/.config/opencode
|
||||
cat > ~/.config/opencode/mcp.json << 'EOF'
|
||||
{
|
||||
"mcpServers": {
|
||||
"skills": {
|
||||
"url": "http://helm:3000"
|
||||
}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
### Test Connection
|
||||
|
||||
1. Restart OpenCode (if running)
|
||||
2. Open the MCP servers panel/tool
|
||||
3. You should see "skills" server listed as connected
|
||||
4. Available tools will include:
|
||||
- `search_skills`
|
||||
- `get_skill`
|
||||
- `list_skills`
|
||||
- `get_context`
|
||||
- `get_conventions`
|
||||
- `get_snippets`
|
||||
- `get_memory`
|
||||
- `add_memory`
|
||||
- `create_skill`
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "Cannot connect to MCP server"
|
||||
- Ensure the stack is up: `docker compose -f /path/to/ai-skills-api/docker-compose.yml ps`
|
||||
- Check MCP service logs: `docker compose logs mcp`
|
||||
- Verify `helm` resolves: `ping helm` or use IP address instead
|
||||
- If using IP, change config to `"url": "http://192.168.x.x:3000"`
|
||||
|
||||
### "Connection refused" or timeout
|
||||
- Ensure port 3000 is exposed: `netstat -tuln | grep 3000` on helm
|
||||
- Check firewall: helm should accept connections on 3000 from your network
|
||||
|
||||
### Tools not appearing
|
||||
- Wait 10-20 seconds after OpenCode starts for MCP connection to establish
|
||||
- Check OpenCode logs for MCP connection errors
|
||||
- Verify the skills service is healthy: `docker compose ps` (mcp should be "Up" and healthy)
|
||||
|
||||
## Using the Tools
|
||||
|
||||
Once connected, you can invoke MCP tools from OpenCode:
|
||||
|
||||
- `get_context(project="/home/user/myapp")` → fetches relevant skills/conventions
|
||||
- `search_skills(query="docker compose")` → finds matching skills
|
||||
- `create_skill(...)` → adds new skill to the database
|
||||
- `add_memory(project, key, content)` → stores learnings
|
||||
|
||||
These calls happen over the network to `helm:3000` and the MCP server forwards requests to the Skills API (`helm:8675` internally).
|
||||
|
||||
## Security Note
|
||||
|
||||
The MCP server is exposed on your home network without authentication (relies on network trust). If you need auth, we can add a reverse proxy or API key layer.
|
||||
|
||||
## One-Line Setup Script
|
||||
|
||||
If you're setting up on a new machine, run this from the `agentic-templates` repo:
|
||||
|
||||
```bash
|
||||
./setup-opencode-mcp.sh
|
||||
```
|
||||
|
||||
It will detect your OpenCode config location and add the MCP server automatically.
|
||||
108
README.md
108
README.md
|
|
@ -2,22 +2,14 @@
|
|||
|
||||
Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.
|
||||
|
||||
## Quick Start
|
||||
**API available at**: `http://helm:8675`
|
||||
**Interactive docs**: `http://helm:8675/docs`
|
||||
|
||||
```bash
|
||||
# Copy config file (optional, uses defaults if missing)
|
||||
cp config.yaml.example config.yaml # customize if needed
|
||||
## Quick Links
|
||||
|
||||
# Run with Docker
|
||||
docker compose up -d
|
||||
|
||||
# Or run locally
|
||||
pip install -r requirements.txt
|
||||
uvicorn main:app --reload
|
||||
```
|
||||
|
||||
API available at `http://helm:8675`
|
||||
Docs at `http://helm:8675/docs`
|
||||
- **[Setup Guide](SETUP.md)** - One-time deployment on your server
|
||||
- **[Usage Guide](USAGE.md)** - How to integrate with your agents
|
||||
- **[Template Repository](https://git.bouncypixel.com/helm/agentic-templates)** - Starter kit for new projects
|
||||
|
||||
## Key Features
|
||||
|
||||
|
|
@ -27,24 +19,22 @@ Docs at `http://helm:8675/docs`
|
|||
- **Simple API**: RESTful JSON API + MCP server for Claude Desktop
|
||||
- **Zero-friction auth**: Optional API key (set-and-forget)
|
||||
|
||||
## Configuration
|
||||
## Quick Start (5 minutes)
|
||||
|
||||
Create `config.yaml` (optional) to customize:
|
||||
```bash
|
||||
# 1. Deploy the service on helm (see SETUP.md for details)
|
||||
docker compose up -d
|
||||
|
||||
```yaml
|
||||
port: 8675
|
||||
rag:
|
||||
max_skills: 3
|
||||
max_conventions: 2
|
||||
max_snippets: 2
|
||||
compression:
|
||||
enabled: true
|
||||
strategy: "extractive" # or "ollama" for phi-3-mini
|
||||
auth:
|
||||
enabled: false # set to true and change api_key
|
||||
# 2. Clone the template repo for your agent project
|
||||
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
|
||||
cd my-agent
|
||||
cp .env.example .env
|
||||
docker compose up -d
|
||||
|
||||
# 3. Your agent is now running with context management
|
||||
```
|
||||
|
||||
Or use environment variables (see `config.py` for full list).
|
||||
See **[SETUP.md](SETUP.md)** for complete deployment instructions and **[USAGE.md](USAGE.md)** for integration patterns.
|
||||
|
||||
## Endpoints
|
||||
|
||||
|
|
@ -113,19 +103,21 @@ async def query_llm(prompt, conversation_history, project=None):
|
|||
|
||||
**Expected savings**: 60-80% token reduction vs. sending everything.
|
||||
|
||||
See **[USAGE.md](USAGE.md)** for complete integration patterns, examples, and best practices.
|
||||
|
||||
## Template Repository
|
||||
|
||||
Want to get started quickly? Use the agent template:
|
||||
|
||||
```bash
|
||||
# Clone the template (on your Forgejo)
|
||||
git clone git.bouncypixel.com:helm/ai-agent-template.git
|
||||
cd ai-agent-template
|
||||
# Clone the template
|
||||
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
|
||||
cd my-agent
|
||||
cp .env.example .env
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
The template includes a working agent integration and docker-compose setup.
|
||||
The template includes a working agent integration and docker-compose setup. See [USAGE.md](USAGE.md) for integration patterns.
|
||||
|
||||
## How It Works (Architecture)
|
||||
|
||||
|
|
@ -166,7 +158,7 @@ If you use Claude Desktop, add to your config:
|
|||
Available tools:
|
||||
- `search_skills`, `get_skill`, `list_skills`
|
||||
- `get_context`, `get_conventions`, `get_snippets`
|
||||
- `check_cache` (deprecated), `get_memory`, `add_memory`, `create_skill`
|
||||
- `get_memory`, `add_memory`, `create_skill`
|
||||
|
||||
## Migration from v1
|
||||
|
||||
|
|
@ -186,52 +178,4 @@ If you were using the old semantic cache:
|
|||
|
||||
MIT
|
||||
|
||||
## Example Usage
|
||||
|
||||
### Create a skill
|
||||
```bash
|
||||
curl -X POST http://helm:8675/skills \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"id": "homelab-docker-compose",
|
||||
"name": "Docker Compose Standard",
|
||||
"category": "homelab",
|
||||
"content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
|
||||
"tags": ["docker", "compose", "infrastructure"]
|
||||
}'
|
||||
```
|
||||
|
||||
### Get context bundle
|
||||
```bash
|
||||
curl "http://helm:8675/context?project=/home/server/apps/media-server&skills=homelab-docker-compose,react-v2"
|
||||
```
|
||||
|
||||
### Check cache
|
||||
```bash
|
||||
curl -X POST http://helm:8675/cache/lookup \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"prompt": "How do I configure traefik?",
|
||||
"model": "claude-3-opus"
|
||||
}'
|
||||
```
|
||||
|
||||
## Integration Pattern
|
||||
|
||||
In your agent's system prompt or pre-request hook:
|
||||
|
||||
1. Call `GET /context?project={current_project}&skills={skill_ids}`
|
||||
2. Inject returned content into the prompt
|
||||
3. Before sending to LLM, check `POST /cache/lookup`
|
||||
4. After receiving response, optionally `POST /cache/store`
|
||||
|
||||
This avoids re-sending your standards every request and caches repeated queries.
|
||||
|
||||
## Database
|
||||
|
||||
SQLite database `ai.db` with tables:
|
||||
- `skills` - Reusable patterns and instructions
|
||||
- `snippets` - Code snippets
|
||||
- `conventions` - Project-specific conventions
|
||||
- `cache` - LRU cache of LLM responses
|
||||
- `memory` - Project memory/notes
|
||||
For detailed usage examples and API reference, see [USAGE.md](USAGE.md) and the interactive docs at `http://helm:8675/docs` when the service is running.
|
||||
|
|
|
|||
394
SETUP.md
Normal file
394
SETUP.md
Normal file
|
|
@ -0,0 +1,394 @@
|
|||
# Setup Guide: AI Skills API
|
||||
|
||||
This guide covers exactly how to deploy the AI Skills API on your home server (`helm`) and set up new agent projects.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Docker & Docker Compose installed on `helm`
|
||||
- Access to `helm` from your development machine (SSH or local)
|
||||
- Optional: Claude Desktop with MCP support
|
||||
|
||||
## Server Setup (One-Time)
|
||||
|
||||
Deploy the AI Skills API service on your home server.
|
||||
|
||||
### 1. Clone the Repository
|
||||
|
||||
```bash
|
||||
# On helm (or accessible to docker)
|
||||
cd /opt
|
||||
git clone ssh://git@helm:222/helm/ai-skills-api.git
|
||||
cd ai-skills-api
|
||||
```
|
||||
|
||||
### 2. Build and Start Services
|
||||
|
||||
```bash
|
||||
# Build and start all services (API + Ollama + MCP)
|
||||
docker compose up -d --build
|
||||
|
||||
# Check it's running
|
||||
docker compose ps
|
||||
# Should show: api, ollama, mcp (all "Up")
|
||||
```
|
||||
|
||||
### 3. Verify Deployment
|
||||
|
||||
```bash
|
||||
# Health check (from helm)
|
||||
curl http://localhost:8675/health
|
||||
|
||||
# Expected response: {"status":"healthy"}
|
||||
```
|
||||
|
||||
### 4. Configure Optional Settings
|
||||
|
||||
Edit `config.yaml` (creates defaults if missing):
|
||||
|
||||
```yaml
|
||||
port: 8675
|
||||
rag:
|
||||
max_skills: 3
|
||||
max_conventions: 2
|
||||
max_snippets: 2
|
||||
compression:
|
||||
enabled: true
|
||||
strategy: "extractive" # or "ollama" for phi-3-mini
|
||||
auth:
|
||||
enabled: false # set to true to require API key
|
||||
api_key: "your-secret-key-here"
|
||||
```
|
||||
|
||||
Restart after changes:
|
||||
|
||||
```bash
|
||||
docker compose restart
|
||||
```
|
||||
|
||||
### 5. (Optional) Enable API Authentication
|
||||
|
||||
If you want auth across your network:
|
||||
|
||||
1. Edit `config.yaml`:
|
||||
```yaml
|
||||
auth:
|
||||
enabled: true
|
||||
api_key: "generate-a-strong-random-key"
|
||||
```
|
||||
|
||||
2. Restart:
|
||||
```bash
|
||||
docker compose restart
|
||||
```
|
||||
|
||||
3. Test:
|
||||
```bash
|
||||
curl http://helm:8675/health # Should work (no auth)
|
||||
curl http://helm:8675/skills # Should fail 401 if auth enabled
|
||||
curl -H "X-API-Key: your-secret-key-here" http://helm:8675/skills # Should work
|
||||
```
|
||||
|
||||
**Note**: API is accessible only on your home network (`helm:8675`). No public exposure by default.
|
||||
|
||||
### Access from Other Machines
|
||||
|
||||
Your agents running on other machines can access the API at `http://helm:8675`.
|
||||
|
||||
```bash
|
||||
# From any machine on your network
|
||||
curl http://helm:8675/health
|
||||
```
|
||||
|
||||
If DNS isn't set up, use `helm` directly (should resolve via local network or hosts file).
|
||||
|
||||
## MCP Server for Claude Desktop / OpenCode
|
||||
|
||||
The stack includes an MCP server that exposes your skills to Claude Desktop or OpenCode via the Model Context Protocol.
|
||||
|
||||
### What's Running
|
||||
|
||||
- **MCP Server**: SSE mode on `http://helm:3000`
|
||||
- Automatically proxies requests to the Skills API (`http://api:8080` internally)
|
||||
- Same Docker network, no extra configuration needed
|
||||
|
||||
### Configure Claude Desktop
|
||||
|
||||
Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
|
||||
|
||||
```json
|
||||
{
|
||||
"mcpServers": {
|
||||
"skills": {
|
||||
"url": "http://helm:3000"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Restart Claude. You should see `skills` server connected with tools like `search_skills`, `get_context`, etc.
|
||||
|
||||
### Configure OpenCode
|
||||
|
||||
See [OPENCODE-MCP.md](OPENCODE-MCP.md) for detailed instructions. In short:
|
||||
|
||||
```bash
|
||||
# Run the setup script from the agentic-templates repo:
|
||||
cd ~/projects/agentic-templates
|
||||
./setup-opencode-mcp.sh
|
||||
|
||||
# Or manually create ~/.config/opencode/mcp.json:
|
||||
{
|
||||
"mcpServers": {
|
||||
"skills": {
|
||||
"url": "http://helm:3000"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Test MCP Connection
|
||||
|
||||
```bash
|
||||
# Should hang (SSE stream) if connected
|
||||
curl http://helm:3000/sse
|
||||
|
||||
# With API key if auth enabled:
|
||||
curl -H "X-API-Key: your-key" http://helm:3000/sse
|
||||
```
|
||||
|
||||
## Project Setup (Per Project/Session)
|
||||
|
||||
For each new project or AI agent, you'll create an integration that uses the API.
|
||||
|
||||
### Option A: Use the Template Repository (Recommended)
|
||||
|
||||
We maintain a template repo for quick starts.
|
||||
|
||||
#### 1. Clone the Template
|
||||
|
||||
```bash
|
||||
cd ~/projects # or wherever you keep projects
|
||||
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
|
||||
cd my-agent
|
||||
```
|
||||
|
||||
Or clone directly via SSH:
|
||||
|
||||
```bash
|
||||
git clone ssh://git@helm:222/helm/agentic-templates.git my-agent
|
||||
```
|
||||
|
||||
#### 2. Configure Environment
|
||||
|
||||
Copy `.env.example` to `.env`:
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
Edit `.env` if needed:
|
||||
|
||||
```env
|
||||
API_URL=http://helm:8675
|
||||
API_KEY= # Only if auth enabled
|
||||
PROJECT=/path/to/your/project # Optional, for context scoping
|
||||
```
|
||||
|
||||
#### 3. Run Your Agent
|
||||
|
||||
```bash
|
||||
# Using Docker Compose (recommended)
|
||||
docker compose up -d
|
||||
|
||||
# Or run directly
|
||||
pip install -r requirements.txt
|
||||
python agent.py
|
||||
```
|
||||
|
||||
The agent will automatically:
|
||||
- Fetch relevant skills/conventions via RAG
|
||||
- Store decisions in memory
|
||||
- Compress conversation when it grows large
|
||||
|
||||
### Option B: Manual Integration
|
||||
|
||||
If you want to integrate into an existing project:
|
||||
|
||||
1. Install the Python dependency:
|
||||
|
||||
```bash
|
||||
pip install httpx
|
||||
```
|
||||
|
||||
2. Copy the integration pattern from `template/agent.py` (the `get_context`, `compress_messages`, `store_memory` functions).
|
||||
|
||||
3. Add these calls to your agent's workflow:
|
||||
|
||||
- Before each LLM call: `context = await get_context(query, project)`
|
||||
- Inject context into system prompt
|
||||
- After each response: `await store_memory(project, key, content)`
|
||||
- When conversation > 10 messages: `compressed = await compress_messages(conversation)`
|
||||
|
||||
See `USAGE.md` for detailed integration patterns.
|
||||
|
||||
## Seeding Skills and Conventions
|
||||
|
||||
The API comes with a seed script that adds useful skills.
|
||||
|
||||
### Run the Seed Script
|
||||
|
||||
```bash
|
||||
cd /opt/ai-skills-api
|
||||
python examples/seed-data.py
|
||||
```
|
||||
|
||||
This adds:
|
||||
- D&D campaign management skills
|
||||
- Infrastructure/Docker skills
|
||||
- Code review skills
|
||||
- General best practices
|
||||
|
||||
### Add Custom Skills
|
||||
|
||||
#### Via API:
|
||||
|
||||
```bash
|
||||
curl -X POST http://helm:8675/skills \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"id": "my-skill",
|
||||
"name": "My Custom Skill",
|
||||
"category": "custom",
|
||||
"content": "Specific instructions for your agent...",
|
||||
"tags": ["keyword1", "keyword2"]
|
||||
}'
|
||||
```
|
||||
|
||||
#### Via MCP (Claude Desktop):
|
||||
|
||||
Use the `skills/create_skill` tool directly in Claude.
|
||||
|
||||
#### Via Python:
|
||||
|
||||
```python
|
||||
import httpx
|
||||
|
||||
resp = httpx.post(
|
||||
"http://helm:8675/skills",
|
||||
json={
|
||||
"id": "unique-skill-id",
|
||||
"name": "Skill Name",
|
||||
"category": "category",
|
||||
"content": "Full skill instructions...",
|
||||
"tags": ["tag1", "tag2"]
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
### Add Project Conventions
|
||||
|
||||
Conventions are project-specific (tied to a project path or identifier):
|
||||
|
||||
```bash
|
||||
curl -X POST http://helm:8675/conventions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"name": "My Project Conventions",
|
||||
"project": "/home/user/myproject",
|
||||
"content": "Project-specific coding standards, workflows, etc."
|
||||
}'
|
||||
```
|
||||
|
||||
## Testing Your Setup
|
||||
|
||||
### 1. Test RAG Context
|
||||
|
||||
```bash
|
||||
curl "http://helm:8675/context/rag?query=docker compose&project=test"
|
||||
```
|
||||
|
||||
Should return JSON with `skills`, `conventions`, `snippets` arrays.
|
||||
|
||||
### 2. Test Compression
|
||||
|
||||
```bash
|
||||
curl -X POST http://helm:8675/compress \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello!"},
|
||||
{"role": "assistant", "content": "Hi there! How can I help?"},
|
||||
{"role": "user", "content": "Tell me about Docker."},
|
||||
{"role": "assistant", "content": "Docker is a containerization platform..."}
|
||||
]
|
||||
}'
|
||||
```
|
||||
|
||||
Should return compressed messages and `tokens_saved`.
|
||||
|
||||
### 3. Test Memory
|
||||
|
||||
```bash
|
||||
curl -X POST http://helm:8675/memory \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"project": "test",
|
||||
"key": "decision-123",
|
||||
"content": "We decided to use FastAPI for this project"
|
||||
}'
|
||||
|
||||
curl "http://helm:8675/memory?project=test"
|
||||
```
|
||||
|
||||
### 4. Test from Agent Template
|
||||
|
||||
```bash
|
||||
cd ~/projects/my-agent
|
||||
docker compose up -d
|
||||
docker compose logs -f agent # Watch the agent start and interact
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Service Won't Start
|
||||
|
||||
```bash
|
||||
# Check logs
|
||||
docker compose logs ai-skills-api
|
||||
|
||||
# Common issues:
|
||||
# - Port 8675 already in use: change port in docker-compose.yml
|
||||
# - Permissions: ensure /opt/ai-skills-api is readable
|
||||
```
|
||||
|
||||
### Ollama Not Pulling Model
|
||||
|
||||
The entrypoint script auto-pulls `phi3:mini` if compression strategy is `ollama`. To force:
|
||||
|
||||
```bash
|
||||
docker compose exec ai-skills-api ollama pull phi3:mini
|
||||
```
|
||||
|
||||
### Can't Connect from Other Machines
|
||||
|
||||
- Ensure `helm` is reachable on the network (ping `helm`)
|
||||
- Check Docker network: `docker network ls` (should have `ai-skills-api_default`)
|
||||
- API is bound to `0.0.0.0:8675` inside container - accessible from host and other containers
|
||||
|
||||
### Auth Errors
|
||||
|
||||
- If you get 401, either disable auth in `config.yaml` or set `API_KEY` in your agent's `.env`
|
||||
- Verify: `curl -H "X-API-Key: your-key" http://helm:8675/skills`
|
||||
|
||||
### High RAG Latency (>10ms)
|
||||
|
||||
- First request after startup will be slower (warming cache)
|
||||
- Subsequent queries should be <5ms
|
||||
- If still slow, check embedding model load: `docker compose logs ai-skills-api`
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Read `USAGE.md` for detailed integration patterns and best practices
|
||||
- Use the template repo for all new agent projects
|
||||
- Add project-specific skills and conventions as you work
|
||||
- Monitor logs for token savings
|
||||
522
USAGE.md
Normal file
522
USAGE.md
Normal file
|
|
@ -0,0 +1,522 @@
|
|||
# Usage Guide: AI Skills API
|
||||
|
||||
This guide explains how to use the AI Skills API effectively in your projects and AI agent sessions.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Understanding the Integration Pattern](#understanding-the-integration-pattern)
|
||||
2. [RAG Context Retrieval](#rag-context-retrieval)
|
||||
3. [Conversation Compression](#conversation-compression)
|
||||
4. [Project Memory](#project-memory)
|
||||
5. [Session Workflow](#session-workflow)
|
||||
6. [Managing Skills](#managing-skills)
|
||||
7. [Token Accounting](#token-accounting)
|
||||
8. [Best Practices](#best-practices)
|
||||
9. [Example Implementations](#example-implementations)
|
||||
|
||||
---
|
||||
|
||||
## Understanding the Integration Pattern
|
||||
|
||||
The API provides three core capabilities that work together:
|
||||
|
||||
1. **RAG (Retrieval-Augmented Generation)**: Before each LLM call, fetch relevant skills, conventions, and snippets based on your query. This injects relevant context without sending your entire knowledge base every time.
|
||||
|
||||
2. **Compression**: When conversation history grows long (>10 turns), compress old messages into summaries to stay within context windows.
|
||||
|
||||
3. **Memory**: Store decisions, configurations, and learnings per project for future reference.
|
||||
|
||||
**Expected savings**: 60-80% token reduction vs. sending everything.
|
||||
|
||||
---
|
||||
|
||||
## RAG Context Retrieval
|
||||
|
||||
### The `/context/rag` Endpoint
|
||||
|
||||
This is your primary integration point. It returns only the most relevant items from your knowledge base.
|
||||
|
||||
**Request:**
|
||||
|
||||
```
|
||||
GET /context/rag?query={query}&project={project}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"skills": [
|
||||
{
|
||||
"id": "homelab-docker-compose",
|
||||
"name": "Docker Compose Standard",
|
||||
"category": "homelab",
|
||||
"content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
|
||||
"relevance_score": 0.89
|
||||
}
|
||||
],
|
||||
"conventions": [
|
||||
{
|
||||
"id": "conv-123",
|
||||
"name": "React Project Standards",
|
||||
"project": "/home/user/my-react-app",
|
||||
"content": "Use TypeScript, React 18+, and functional components with hooks.",
|
||||
"relevance_score": 0.76
|
||||
}
|
||||
],
|
||||
"snippets": [
|
||||
{
|
||||
"id": "snippet-456",
|
||||
"name": "FastAPI CORS setup",
|
||||
"language": "python",
|
||||
"content": "app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], ...)",
|
||||
"relevance_score": 0.82
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
- Skills are globally available (your general knowledge base)
|
||||
- Conventions are scoped to a project path or identifier (e.g., `/home/user/project1`)
|
||||
- Snippets are globally available code examples
|
||||
- Relevance scores are cosine similarity (0-1) - items below 0.3 are typically filtered out
|
||||
- Limits are configurable (default: 3 skills, 2 conventions, 2 snippets)
|
||||
|
||||
### Usage Pattern
|
||||
|
||||
```python
|
||||
async def query_with_context(query: str, project: str = None):
|
||||
# 1. Fetch context
|
||||
context = await get_context(query, project)
|
||||
|
||||
# 2. Build system prompt
|
||||
system_prompt = format_context(context)
|
||||
# system_prompt now contains:
|
||||
# ## Relevant Skills
|
||||
# ### Docker Compose Standard (relevance: 0.89)
|
||||
# Always use docker-compose v3.8+...
|
||||
# ...
|
||||
|
||||
# 3. Inject into LLM call
|
||||
messages = [
|
||||
{"role": "system", "content": system_prompt},
|
||||
{"role": "user", "content": query}
|
||||
]
|
||||
response = await llm.chat(messages)
|
||||
|
||||
return response
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conversation Compression
|
||||
|
||||
### The `/compress` Endpoint
|
||||
|
||||
Compresses a list of conversation messages into a shorter representation.
|
||||
|
||||
**Request:**
|
||||
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{"role": "user", "content": "Hello!"},
|
||||
{"role": "assistant", "content": "Hi! How can I help?"},
|
||||
{"role": "user", "content": "I need to set up Docker Compose."},
|
||||
{"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."},
|
||||
... (up to 20+ messages)
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
|
||||
```json
|
||||
{
|
||||
"messages": [
|
||||
{"role": "system", "content": "Summary of earlier conversation..."},
|
||||
{"role": "user", "content": "I need to set up Docker Compose."},
|
||||
{"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."}
|
||||
],
|
||||
"tokens_saved": 245
|
||||
}
|
||||
```
|
||||
|
||||
### Compression Strategies
|
||||
|
||||
- **Extractive** (default): Uses LSA summarization to select key sentences. Fast (~100-500ms), no model required.
|
||||
- **Ollama**: Uses `phi3:mini` for abstractive summaries. Better quality but slower (~2s). Requires Ollama running.
|
||||
|
||||
**Configure in `config.yaml`:**
|
||||
|
||||
```yaml
|
||||
compression:
|
||||
enabled: true
|
||||
strategy: "extractive" # or "ollama"
|
||||
```
|
||||
|
||||
### Usage Pattern
|
||||
|
||||
```python
|
||||
conversation = []
|
||||
|
||||
async def chat(query):
|
||||
# Add user message
|
||||
conversation.append({"role": "user", "content": query})
|
||||
|
||||
# Call LLM (with context from RAG)
|
||||
response = await llm.chat(conversation)
|
||||
conversation.append({"role": "assistant", "content": response})
|
||||
|
||||
# Compress when conversation gets long
|
||||
if len(conversation) >= 10:
|
||||
compressed = await compress_messages(conversation)
|
||||
conversation = compressed["messages"]
|
||||
print(f"Saved {compressed['tokens_saved']} tokens")
|
||||
|
||||
return response
|
||||
```
|
||||
|
||||
**Important**: Keep the most recent ~4-6 turns uncompressed. The compression endpoint preserves recent messages and compresses only the older ones.
|
||||
|
||||
---
|
||||
|
||||
## Project Memory
|
||||
|
||||
### The `/memory` Endpoints
|
||||
|
||||
Store and retrieve project-specific knowledge.
|
||||
|
||||
**Store:**
|
||||
|
||||
```
|
||||
POST /memory
|
||||
{
|
||||
"project": "my-project",
|
||||
"key": "architecture-decision-2024-01-15",
|
||||
"content": "We chose FastAPI over Flask for async support and automatic OpenAPI docs."
|
||||
}
|
||||
```
|
||||
|
||||
**Retrieve:**
|
||||
|
||||
```
|
||||
GET /memory?project=my-project
|
||||
```
|
||||
|
||||
**Update:**
|
||||
|
||||
```
|
||||
PUT /memory/{id}
|
||||
```
|
||||
|
||||
**Delete:**
|
||||
|
||||
```
|
||||
DELETE /memory/{id}
|
||||
```
|
||||
|
||||
### Usage Pattern
|
||||
|
||||
```python
|
||||
# Store a decision after making it
|
||||
await store_memory(
|
||||
project="/home/user/myapp",
|
||||
key="db-choice",
|
||||
content="Using PostgreSQL over MongoDB for relational data integrity"
|
||||
)
|
||||
|
||||
# Retrieve past decisions at project start
|
||||
resp = httpx.get("http://helm:8675/memory", params={"project": "/home/user/myapp"})
|
||||
decisions = resp.json()["entries"]
|
||||
# decisions = [{"id": "...", "key": "db-choice", "content": "...", ...}]
|
||||
```
|
||||
|
||||
**When to use memory:**
|
||||
- Architecture decisions
|
||||
- Configuration choices (API keys, service URLs)
|
||||
- Learned preferences ("User likes code examples")
|
||||
- Debugging notes ("Issue with CORS on port 8080")
|
||||
|
||||
**When NOT to use memory:**
|
||||
- Temporary conversation state (use compression instead)
|
||||
- Large codebases (store in skills/snippets instead)
|
||||
- Public documentation (should be in skills)
|
||||
|
||||
---
|
||||
|
||||
## Session Workflow
|
||||
|
||||
### Starting a New Session
|
||||
|
||||
1. **Define your project identifier** - a path or unique string:
|
||||
```python
|
||||
PROJECT = "/home/user/myapp" # or "my-discord-bot", "workspace-123"
|
||||
```
|
||||
|
||||
2. **Load past memories** (optional but helpful):
|
||||
```python
|
||||
memories = httpx.get("http://helm:8675/memory", params={"project": PROJECT}).json()["entries"]
|
||||
# Inject into system prompt or create context from them
|
||||
```
|
||||
|
||||
3. **Begin conversation loop** - for each user query:
|
||||
- Call `GET /context/rag?query=...&project=PROJECT`
|
||||
- Inject context into LLM prompt
|
||||
- Call LLM
|
||||
- Store important outputs in memory if they represent decisions/learnings
|
||||
- Compress conversation when it reaches ~10 turns
|
||||
|
||||
### Ending a Session
|
||||
|
||||
- Optionally store a session summary in memory:
|
||||
```python
|
||||
await store_memory(PROJECT, "session-summary-2024-01-15", "Completed user auth flow, decided on JWT tokens")
|
||||
```
|
||||
|
||||
- No cleanup needed - conversation state lives in your agent, not the server
|
||||
|
||||
### Multi-Project Agents
|
||||
|
||||
If your agent works across multiple projects:
|
||||
|
||||
```python
|
||||
# Switch project context mid-conversation
|
||||
PROJECT = "/home/user/project1" # current active project
|
||||
|
||||
# Each project has its own conventions and memories
|
||||
context = await get_context(query, project=PROJECT)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Managing Skills
|
||||
|
||||
Skills are your reusable knowledge base. Manage them via API, MCP, or the seed script.
|
||||
|
||||
### Categories
|
||||
|
||||
Group skills by category (e.g., `homelab`, `dnd`, `python`, `devops`). Categories don't affect RAG retrieval but help with organization.
|
||||
|
||||
### Tags
|
||||
|
||||
Tags are keywords used for **future search** (not currently used by RAG, but planned for enhanced filtering).
|
||||
|
||||
```json
|
||||
{
|
||||
"tags": ["docker", "compose", "infrastructure", "production"]
|
||||
}
|
||||
```
|
||||
|
||||
### Best Practices for Skills
|
||||
|
||||
- **Be specific**: "Docker Compose Production Patterns" > "Docker"
|
||||
- **Include examples**: Show code snippets in the content
|
||||
- **Keep it concise**: 1-3 paragraphs, focus on actionable guidance
|
||||
- **Use markdown**: The API preserves formatting for injection into prompts
|
||||
- **Version when updating**: If a skill changes significantly, create a new `id` (e.g., `docker-compose-v2`)
|
||||
|
||||
### Search Skills
|
||||
|
||||
```
|
||||
GET /skills/search?q={query}
|
||||
```
|
||||
|
||||
Returns matching skills by name/content similarity. Useful for manual exploration but not needed in automated agents (use `/context/rag` instead).
|
||||
|
||||
---
|
||||
|
||||
## Token Accounting
|
||||
|
||||
### Count Tokens
|
||||
|
||||
```
|
||||
GET /tokens/count?text={text}
|
||||
```
|
||||
|
||||
Returns the token count (using tiktoken for GPT models, approximations for others).
|
||||
|
||||
**Use this to:**
|
||||
- Track compression savings
|
||||
- Pre-flight check prompts before sending to LLM
|
||||
- Budget token usage per session
|
||||
|
||||
### Example: Measure RAG Savings
|
||||
|
||||
```python
|
||||
full_context = load_all_skills() # hypothetical: all your skills text
|
||||
full_tokens = count_tokens(full_context)
|
||||
|
||||
rag_context = get_context(query, project) # only relevant items
|
||||
rag_tokens = count_tokens(format_context(rag_context))
|
||||
|
||||
savings_pct = (1 - rag_tokens / full_tokens) * 100
|
||||
print(f"RAG saved {savings_pct:.1f}% tokens")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
### 1. Always Use Project Scoping
|
||||
|
||||
Set `project` parameter consistently. Even if you have one main project, use a consistent identifier:
|
||||
|
||||
```python
|
||||
PROJECT = "/home/user/myapp" # NOT "default" or None
|
||||
context = await get_context(query, project=PROJECT)
|
||||
```
|
||||
|
||||
This allows:
|
||||
- Project-specific conventions
|
||||
- Memory isolation between projects
|
||||
- Future per-project analytics
|
||||
|
||||
### 2. Call RAG Before Every LLM Request
|
||||
|
||||
Even if the query seems unrelated, the cost is negligible (<5ms, ~50 tokens). The knowledge injected often improves responses.
|
||||
|
||||
### 3. Compress Proactively
|
||||
|
||||
Don't wait until context window is full. Compress at ~10 messages:
|
||||
|
||||
```python
|
||||
if len(conversation) >= 10:
|
||||
compressed = await compress_messages(conversation)
|
||||
conversation = compressed["messages"]
|
||||
```
|
||||
|
||||
This keeps the compression quality high (summaries are more accurate with fewer messages).
|
||||
|
||||
### 4. Store Learnings, Not Everything
|
||||
|
||||
Memory is for **decisions** and **facts you want to recall**.
|
||||
|
||||
Don't store:
|
||||
- Every user query/response (that's what compression is for)
|
||||
- Public documentation (put in skills instead)
|
||||
- Transient state (keep in agent memory)
|
||||
|
||||
### 5. Version Your Skills
|
||||
|
||||
When a skill's guidance changes:
|
||||
|
||||
- **Minor update** (typo, clarification): update the existing skill's `content` in place
|
||||
- **Major update** (different approach, breaking change): create a new `id` (e.g., `docker-compose-v2`) and optionally mark the old one as deprecated in its content
|
||||
|
||||
### 6. Use MCP in Claude Desktop
|
||||
|
||||
If you use Claude Desktop, add the MCP server (see `CLAUDE.md`). This gives you:
|
||||
- Direct access to skills via Claude's tool calling
|
||||
- No need to implement API calls manually
|
||||
- Same token savings within Claude
|
||||
|
||||
### 7. Monitor Token Savings
|
||||
|
||||
Track metrics:
|
||||
|
||||
```python
|
||||
import time
|
||||
from datetime import datetime
|
||||
|
||||
logs = []
|
||||
|
||||
def log_savings(tokens_before, tokens_after, operation):
|
||||
logs.append({
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"operation": operation,
|
||||
"tokens_before": tokens_before,
|
||||
"tokens_after": tokens_after,
|
||||
"savings": tokens_before - tokens_after
|
||||
})
|
||||
# Periodically upload or analyze these
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Example Implementations
|
||||
|
||||
### Minimal Agent
|
||||
|
||||
```python
|
||||
import asyncio, httpx, os
|
||||
|
||||
API_URL = os.getenv("API_URL", "http://helm:8675")
|
||||
PROJECT = os.getenv("PROJECT", "/default")
|
||||
|
||||
async def get_context(query):
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(f"{API_URL}/context/rag", params={"query": query, "project": PROJECT})
|
||||
return resp.json()
|
||||
|
||||
async def chat():
|
||||
conv = []
|
||||
while True:
|
||||
query = input("You: ")
|
||||
if query == "quit": break
|
||||
|
||||
# Get context
|
||||
ctx = await get_context(query)
|
||||
system = format_context(ctx)
|
||||
|
||||
# Call LLM (pseudo)
|
||||
response = call_llm(system, conv[-4:], query)
|
||||
|
||||
conv.extend([{"role": "user", "content": query},
|
||||
{"role": "assistant", "content": response}])
|
||||
|
||||
print(f"Assistant: {response}")
|
||||
|
||||
asyncio.run(chat())
|
||||
```
|
||||
|
||||
### Discord Bot with Context
|
||||
|
||||
```python
|
||||
import discord
|
||||
from discord.ext import commands
|
||||
import httpx
|
||||
|
||||
bot = commands.Bot(command_prefix="!")
|
||||
API_URL = "http://helm:8675"
|
||||
PROJECT = "/home/user/discord-bot"
|
||||
|
||||
@bot.event
|
||||
async def on_message(message):
|
||||
if message.author == bot.user:
|
||||
return
|
||||
|
||||
# RAG context
|
||||
async with httpx.AsyncClient() as client:
|
||||
resp = await client.get(f"{API_URL}/context/rag", params={"query": message.content, "project": PROJECT})
|
||||
ctx = resp.json()
|
||||
|
||||
# Build prompt
|
||||
system_prompt = format_context(ctx) + "\n\nYou are a helpful Discord bot."
|
||||
|
||||
# Respond (using your LLM of choice)
|
||||
response = await generate_response(message.content, system_prompt)
|
||||
await message.reply(response)
|
||||
|
||||
# Store in memory if it's a decision
|
||||
if "decision" in message.content.lower():
|
||||
async with httpx.AsyncClient() as client:
|
||||
await client.post(f"{API_URL}/memory", json={
|
||||
"project": PROJECT,
|
||||
"key": f"decision-{discord.utils.utcnow().timestamp()}",
|
||||
"content": response[:500]
|
||||
})
|
||||
|
||||
bot.run(os.getenv("DISCORD_TOKEN"))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Need More Help?
|
||||
|
||||
- **Setup issues**: See `SETUP.md`
|
||||
- **Template repo**: Clone `git.bouncypixel.com:helm/agentic-templates.git`
|
||||
- **API reference**: Visit `http://helm:8675/docs` when the service is running
|
||||
- **MCP tools**: See `CLAUDE.md` for Claude Desktop integration
|
||||
|
|
@ -28,3 +28,18 @@ services:
|
|||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
mcp:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: mcp/Dockerfile
|
||||
command: python skills.py
|
||||
ports:
|
||||
- "3000:3000"
|
||||
environment:
|
||||
- SKILLS_API_URL=http://api:8080
|
||||
- MCP_TRANSPORT=sse
|
||||
- MCP_PORT=3000
|
||||
depends_on:
|
||||
- api
|
||||
restart: unless-stopped
|
||||
|
|
|
|||
|
|
@ -3,3 +3,4 @@ httpx==0.26.0
|
|||
python-dotenv==1.0.0
|
||||
docker==7.0.0
|
||||
psutil==5.9.7
|
||||
uvicorn[standard]==0.27.0
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
from mcp.server.fastmcp import FastMCP
|
||||
import httpx
|
||||
import os
|
||||
import uvicorn
|
||||
|
||||
mcp = FastMCP("skills")
|
||||
|
||||
|
|
@ -162,4 +163,11 @@ def create_skill(
|
|||
|
||||
|
||||
if __name__ == "__main__":
|
||||
mcp.run()
|
||||
transport = os.getenv("MCP_TRANSPORT", "stdio")
|
||||
|
||||
if transport == "sse":
|
||||
host = os.getenv("MCP_HOST", "0.0.0.0")
|
||||
port = int(os.getenv("MCP_PORT", "3000"))
|
||||
mcp.run_sse(host=host, port=port)
|
||||
else:
|
||||
mcp.run()
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue