Add SSE MCP server, comprehensive docs, and OpenCode integration
- Implement SSE mode for MCP server (mcp/skills.py) - Add MCP service to docker-compose.yml on port 3000 - Add uvicorn dependency to mcp/requirements.txt - Create SETUP.md, USAGE.md, OPENCODE-MCP.md - Update README with quick links and MCP section - Remove semantic cache references throughout - Add cross-platform Python MCP setup script to template repo
This commit is contained in:
parent
95805dfc86
commit
e346d356e5
7 changed files with 1085 additions and 83 deletions
118
OPENCODE-MCP.md
Normal file
118
OPENCODE-MCP.md
Normal file
|
|
@ -0,0 +1,118 @@
|
||||||
|
# OpenCode MCP Configuration
|
||||||
|
|
||||||
|
OpenCode (open-source alternative to Cursor/Claude) supports MCP servers. This guide shows how to connect it to your AI Skills API MCP server running on `helm`.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- AI Skills API stack running on `helm` (includes MCP server on port 3000)
|
||||||
|
- OpenCode installed on your local machine
|
||||||
|
|
||||||
|
## MCP Server Endpoint
|
||||||
|
|
||||||
|
Your MCP server is accessible at:
|
||||||
|
```
|
||||||
|
http://helm:3000
|
||||||
|
```
|
||||||
|
|
||||||
|
It exposes two endpoints:
|
||||||
|
- `GET /sse` - Server-Sent Events (for client connection)
|
||||||
|
- `POST /messages` - JSON-RPC messages
|
||||||
|
|
||||||
|
## OpenCode Configuration
|
||||||
|
|
||||||
|
OpenCode reads MCP server config from its settings. You need to add an MCP server with the SSE URL.
|
||||||
|
|
||||||
|
### Configuration JSON
|
||||||
|
|
||||||
|
Add this to your OpenCode MCP configuration (location varies by install):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"skills": {
|
||||||
|
"url": "http://helm:3000"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: Use `"url"` not `"command"` since the server is remote and uses SSE transport.
|
||||||
|
|
||||||
|
### Where to Put This
|
||||||
|
|
||||||
|
OpenCode typically reads MCP config from:
|
||||||
|
- `~/.config/opencode/mcp.json`
|
||||||
|
- or in the app settings UI (Preferences → MCP → Add Server → Manual)
|
||||||
|
|
||||||
|
If using a file, create/edit `~/.config/opencode/mcp.json`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
mkdir -p ~/.config/opencode
|
||||||
|
cat > ~/.config/opencode/mcp.json << 'EOF'
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"skills": {
|
||||||
|
"url": "http://helm:3000"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
EOF
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test Connection
|
||||||
|
|
||||||
|
1. Restart OpenCode (if running)
|
||||||
|
2. Open the MCP servers panel/tool
|
||||||
|
3. You should see "skills" server listed as connected
|
||||||
|
4. Available tools will include:
|
||||||
|
- `search_skills`
|
||||||
|
- `get_skill`
|
||||||
|
- `list_skills`
|
||||||
|
- `get_context`
|
||||||
|
- `get_conventions`
|
||||||
|
- `get_snippets`
|
||||||
|
- `get_memory`
|
||||||
|
- `add_memory`
|
||||||
|
- `create_skill`
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### "Cannot connect to MCP server"
|
||||||
|
- Ensure the stack is up: `docker compose -f /path/to/ai-skills-api/docker-compose.yml ps`
|
||||||
|
- Check MCP service logs: `docker compose logs mcp`
|
||||||
|
- Verify `helm` resolves: `ping helm` or use IP address instead
|
||||||
|
- If using IP, change config to `"url": "http://192.168.x.x:3000"`
|
||||||
|
|
||||||
|
### "Connection refused" or timeout
|
||||||
|
- Ensure port 3000 is exposed: `netstat -tuln | grep 3000` on helm
|
||||||
|
- Check firewall: helm should accept connections on 3000 from your network
|
||||||
|
|
||||||
|
### Tools not appearing
|
||||||
|
- Wait 10-20 seconds after OpenCode starts for MCP connection to establish
|
||||||
|
- Check OpenCode logs for MCP connection errors
|
||||||
|
- Verify the skills service is healthy: `docker compose ps` (mcp should be "Up" and healthy)
|
||||||
|
|
||||||
|
## Using the Tools
|
||||||
|
|
||||||
|
Once connected, you can invoke MCP tools from OpenCode:
|
||||||
|
|
||||||
|
- `get_context(project="/home/user/myapp")` → fetches relevant skills/conventions
|
||||||
|
- `search_skills(query="docker compose")` → finds matching skills
|
||||||
|
- `create_skill(...)` → adds new skill to the database
|
||||||
|
- `add_memory(project, key, content)` → stores learnings
|
||||||
|
|
||||||
|
These calls happen over the network to `helm:3000` and the MCP server forwards requests to the Skills API (`helm:8675` internally).
|
||||||
|
|
||||||
|
## Security Note
|
||||||
|
|
||||||
|
The MCP server is exposed on your home network without authentication (relies on network trust). If you need auth, we can add a reverse proxy or API key layer.
|
||||||
|
|
||||||
|
## One-Line Setup Script
|
||||||
|
|
||||||
|
If you're setting up on a new machine, run this from the `agentic-templates` repo:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./setup-opencode-mcp.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
It will detect your OpenCode config location and add the MCP server automatically.
|
||||||
108
README.md
108
README.md
|
|
@ -2,22 +2,14 @@
|
||||||
|
|
||||||
Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.
|
Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.
|
||||||
|
|
||||||
## Quick Start
|
**API available at**: `http://helm:8675`
|
||||||
|
**Interactive docs**: `http://helm:8675/docs`
|
||||||
|
|
||||||
```bash
|
## Quick Links
|
||||||
# Copy config file (optional, uses defaults if missing)
|
|
||||||
cp config.yaml.example config.yaml # customize if needed
|
|
||||||
|
|
||||||
# Run with Docker
|
- **[Setup Guide](SETUP.md)** - One-time deployment on your server
|
||||||
docker compose up -d
|
- **[Usage Guide](USAGE.md)** - How to integrate with your agents
|
||||||
|
- **[Template Repository](https://git.bouncypixel.com/helm/agentic-templates)** - Starter kit for new projects
|
||||||
# Or run locally
|
|
||||||
pip install -r requirements.txt
|
|
||||||
uvicorn main:app --reload
|
|
||||||
```
|
|
||||||
|
|
||||||
API available at `http://helm:8675`
|
|
||||||
Docs at `http://helm:8675/docs`
|
|
||||||
|
|
||||||
## Key Features
|
## Key Features
|
||||||
|
|
||||||
|
|
@ -27,24 +19,22 @@ Docs at `http://helm:8675/docs`
|
||||||
- **Simple API**: RESTful JSON API + MCP server for Claude Desktop
|
- **Simple API**: RESTful JSON API + MCP server for Claude Desktop
|
||||||
- **Zero-friction auth**: Optional API key (set-and-forget)
|
- **Zero-friction auth**: Optional API key (set-and-forget)
|
||||||
|
|
||||||
## Configuration
|
## Quick Start (5 minutes)
|
||||||
|
|
||||||
Create `config.yaml` (optional) to customize:
|
```bash
|
||||||
|
# 1. Deploy the service on helm (see SETUP.md for details)
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
```yaml
|
# 2. Clone the template repo for your agent project
|
||||||
port: 8675
|
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
|
||||||
rag:
|
cd my-agent
|
||||||
max_skills: 3
|
cp .env.example .env
|
||||||
max_conventions: 2
|
docker compose up -d
|
||||||
max_snippets: 2
|
|
||||||
compression:
|
# 3. Your agent is now running with context management
|
||||||
enabled: true
|
|
||||||
strategy: "extractive" # or "ollama" for phi-3-mini
|
|
||||||
auth:
|
|
||||||
enabled: false # set to true and change api_key
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Or use environment variables (see `config.py` for full list).
|
See **[SETUP.md](SETUP.md)** for complete deployment instructions and **[USAGE.md](USAGE.md)** for integration patterns.
|
||||||
|
|
||||||
## Endpoints
|
## Endpoints
|
||||||
|
|
||||||
|
|
@ -113,19 +103,21 @@ async def query_llm(prompt, conversation_history, project=None):
|
||||||
|
|
||||||
**Expected savings**: 60-80% token reduction vs. sending everything.
|
**Expected savings**: 60-80% token reduction vs. sending everything.
|
||||||
|
|
||||||
|
See **[USAGE.md](USAGE.md)** for complete integration patterns, examples, and best practices.
|
||||||
|
|
||||||
## Template Repository
|
## Template Repository
|
||||||
|
|
||||||
Want to get started quickly? Use the agent template:
|
Want to get started quickly? Use the agent template:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the template (on your Forgejo)
|
# Clone the template
|
||||||
git clone git.bouncypixel.com:helm/ai-agent-template.git
|
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
|
||||||
cd ai-agent-template
|
cd my-agent
|
||||||
cp .env.example .env
|
cp .env.example .env
|
||||||
docker compose up -d
|
docker compose up -d
|
||||||
```
|
```
|
||||||
|
|
||||||
The template includes a working agent integration and docker-compose setup.
|
The template includes a working agent integration and docker-compose setup. See [USAGE.md](USAGE.md) for integration patterns.
|
||||||
|
|
||||||
## How It Works (Architecture)
|
## How It Works (Architecture)
|
||||||
|
|
||||||
|
|
@ -166,7 +158,7 @@ If you use Claude Desktop, add to your config:
|
||||||
Available tools:
|
Available tools:
|
||||||
- `search_skills`, `get_skill`, `list_skills`
|
- `search_skills`, `get_skill`, `list_skills`
|
||||||
- `get_context`, `get_conventions`, `get_snippets`
|
- `get_context`, `get_conventions`, `get_snippets`
|
||||||
- `check_cache` (deprecated), `get_memory`, `add_memory`, `create_skill`
|
- `get_memory`, `add_memory`, `create_skill`
|
||||||
|
|
||||||
## Migration from v1
|
## Migration from v1
|
||||||
|
|
||||||
|
|
@ -186,52 +178,4 @@ If you were using the old semantic cache:
|
||||||
|
|
||||||
MIT
|
MIT
|
||||||
|
|
||||||
## Example Usage
|
For detailed usage examples and API reference, see [USAGE.md](USAGE.md) and the interactive docs at `http://helm:8675/docs` when the service is running.
|
||||||
|
|
||||||
### Create a skill
|
|
||||||
```bash
|
|
||||||
curl -X POST http://helm:8675/skills \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"id": "homelab-docker-compose",
|
|
||||||
"name": "Docker Compose Standard",
|
|
||||||
"category": "homelab",
|
|
||||||
"content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
|
|
||||||
"tags": ["docker", "compose", "infrastructure"]
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Get context bundle
|
|
||||||
```bash
|
|
||||||
curl "http://helm:8675/context?project=/home/server/apps/media-server&skills=homelab-docker-compose,react-v2"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Check cache
|
|
||||||
```bash
|
|
||||||
curl -X POST http://helm:8675/cache/lookup \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"prompt": "How do I configure traefik?",
|
|
||||||
"model": "claude-3-opus"
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Integration Pattern
|
|
||||||
|
|
||||||
In your agent's system prompt or pre-request hook:
|
|
||||||
|
|
||||||
1. Call `GET /context?project={current_project}&skills={skill_ids}`
|
|
||||||
2. Inject returned content into the prompt
|
|
||||||
3. Before sending to LLM, check `POST /cache/lookup`
|
|
||||||
4. After receiving response, optionally `POST /cache/store`
|
|
||||||
|
|
||||||
This avoids re-sending your standards every request and caches repeated queries.
|
|
||||||
|
|
||||||
## Database
|
|
||||||
|
|
||||||
SQLite database `ai.db` with tables:
|
|
||||||
- `skills` - Reusable patterns and instructions
|
|
||||||
- `snippets` - Code snippets
|
|
||||||
- `conventions` - Project-specific conventions
|
|
||||||
- `cache` - LRU cache of LLM responses
|
|
||||||
- `memory` - Project memory/notes
|
|
||||||
|
|
|
||||||
394
SETUP.md
Normal file
394
SETUP.md
Normal file
|
|
@ -0,0 +1,394 @@
|
||||||
|
# Setup Guide: AI Skills API
|
||||||
|
|
||||||
|
This guide covers exactly how to deploy the AI Skills API on your home server (`helm`) and set up new agent projects.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Docker & Docker Compose installed on `helm`
|
||||||
|
- Access to `helm` from your development machine (SSH or local)
|
||||||
|
- Optional: Claude Desktop with MCP support
|
||||||
|
|
||||||
|
## Server Setup (One-Time)
|
||||||
|
|
||||||
|
Deploy the AI Skills API service on your home server.
|
||||||
|
|
||||||
|
### 1. Clone the Repository
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# On helm (or accessible to docker)
|
||||||
|
cd /opt
|
||||||
|
git clone ssh://git@helm:222/helm/ai-skills-api.git
|
||||||
|
cd ai-skills-api
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Build and Start Services
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build and start all services (API + Ollama + MCP)
|
||||||
|
docker compose up -d --build
|
||||||
|
|
||||||
|
# Check it's running
|
||||||
|
docker compose ps
|
||||||
|
# Should show: api, ollama, mcp (all "Up")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Verify Deployment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Health check (from helm)
|
||||||
|
curl http://localhost:8675/health
|
||||||
|
|
||||||
|
# Expected response: {"status":"healthy"}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Configure Optional Settings
|
||||||
|
|
||||||
|
Edit `config.yaml` (creates defaults if missing):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
port: 8675
|
||||||
|
rag:
|
||||||
|
max_skills: 3
|
||||||
|
max_conventions: 2
|
||||||
|
max_snippets: 2
|
||||||
|
compression:
|
||||||
|
enabled: true
|
||||||
|
strategy: "extractive" # or "ollama" for phi-3-mini
|
||||||
|
auth:
|
||||||
|
enabled: false # set to true to require API key
|
||||||
|
api_key: "your-secret-key-here"
|
||||||
|
```
|
||||||
|
|
||||||
|
Restart after changes:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose restart
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. (Optional) Enable API Authentication
|
||||||
|
|
||||||
|
If you want auth across your network:
|
||||||
|
|
||||||
|
1. Edit `config.yaml`:
|
||||||
|
```yaml
|
||||||
|
auth:
|
||||||
|
enabled: true
|
||||||
|
api_key: "generate-a-strong-random-key"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Restart:
|
||||||
|
```bash
|
||||||
|
docker compose restart
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Test:
|
||||||
|
```bash
|
||||||
|
curl http://helm:8675/health # Should work (no auth)
|
||||||
|
curl http://helm:8675/skills # Should fail 401 if auth enabled
|
||||||
|
curl -H "X-API-Key: your-secret-key-here" http://helm:8675/skills # Should work
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: API is accessible only on your home network (`helm:8675`). No public exposure by default.
|
||||||
|
|
||||||
|
### Access from Other Machines
|
||||||
|
|
||||||
|
Your agents running on other machines can access the API at `http://helm:8675`.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# From any machine on your network
|
||||||
|
curl http://helm:8675/health
|
||||||
|
```
|
||||||
|
|
||||||
|
If DNS isn't set up, use `helm` directly (should resolve via local network or hosts file).
|
||||||
|
|
||||||
|
## MCP Server for Claude Desktop / OpenCode
|
||||||
|
|
||||||
|
The stack includes an MCP server that exposes your skills to Claude Desktop or OpenCode via the Model Context Protocol.
|
||||||
|
|
||||||
|
### What's Running
|
||||||
|
|
||||||
|
- **MCP Server**: SSE mode on `http://helm:3000`
|
||||||
|
- Automatically proxies requests to the Skills API (`http://api:8080` internally)
|
||||||
|
- Same Docker network, no extra configuration needed
|
||||||
|
|
||||||
|
### Configure Claude Desktop
|
||||||
|
|
||||||
|
Add to `~/Library/Application Support/Claude/claude_desktop_config.json` (macOS) or `%APPDATA%\Claude\claude_desktop_config.json` (Windows):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"skills": {
|
||||||
|
"url": "http://helm:3000"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Restart Claude. You should see `skills` server connected with tools like `search_skills`, `get_context`, etc.
|
||||||
|
|
||||||
|
### Configure OpenCode
|
||||||
|
|
||||||
|
See [OPENCODE-MCP.md](OPENCODE-MCP.md) for detailed instructions. In short:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run the setup script from the agentic-templates repo:
|
||||||
|
cd ~/projects/agentic-templates
|
||||||
|
./setup-opencode-mcp.sh
|
||||||
|
|
||||||
|
# Or manually create ~/.config/opencode/mcp.json:
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"skills": {
|
||||||
|
"url": "http://helm:3000"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test MCP Connection
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Should hang (SSE stream) if connected
|
||||||
|
curl http://helm:3000/sse
|
||||||
|
|
||||||
|
# With API key if auth enabled:
|
||||||
|
curl -H "X-API-Key: your-key" http://helm:3000/sse
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Setup (Per Project/Session)
|
||||||
|
|
||||||
|
For each new project or AI agent, you'll create an integration that uses the API.
|
||||||
|
|
||||||
|
### Option A: Use the Template Repository (Recommended)
|
||||||
|
|
||||||
|
We maintain a template repo for quick starts.
|
||||||
|
|
||||||
|
#### 1. Clone the Template
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/projects # or wherever you keep projects
|
||||||
|
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
|
||||||
|
cd my-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
Or clone directly via SSH:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git clone ssh://git@helm:222/helm/agentic-templates.git my-agent
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 2. Configure Environment
|
||||||
|
|
||||||
|
Copy `.env.example` to `.env`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
```
|
||||||
|
|
||||||
|
Edit `.env` if needed:
|
||||||
|
|
||||||
|
```env
|
||||||
|
API_URL=http://helm:8675
|
||||||
|
API_KEY= # Only if auth enabled
|
||||||
|
PROJECT=/path/to/your/project # Optional, for context scoping
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. Run Your Agent
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Using Docker Compose (recommended)
|
||||||
|
docker compose up -d
|
||||||
|
|
||||||
|
# Or run directly
|
||||||
|
pip install -r requirements.txt
|
||||||
|
python agent.py
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent will automatically:
|
||||||
|
- Fetch relevant skills/conventions via RAG
|
||||||
|
- Store decisions in memory
|
||||||
|
- Compress conversation when it grows large
|
||||||
|
|
||||||
|
### Option B: Manual Integration
|
||||||
|
|
||||||
|
If you want to integrate into an existing project:
|
||||||
|
|
||||||
|
1. Install the Python dependency:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install httpx
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Copy the integration pattern from `template/agent.py` (the `get_context`, `compress_messages`, `store_memory` functions).
|
||||||
|
|
||||||
|
3. Add these calls to your agent's workflow:
|
||||||
|
|
||||||
|
- Before each LLM call: `context = await get_context(query, project)`
|
||||||
|
- Inject context into system prompt
|
||||||
|
- After each response: `await store_memory(project, key, content)`
|
||||||
|
- When conversation > 10 messages: `compressed = await compress_messages(conversation)`
|
||||||
|
|
||||||
|
See `USAGE.md` for detailed integration patterns.
|
||||||
|
|
||||||
|
## Seeding Skills and Conventions
|
||||||
|
|
||||||
|
The API comes with a seed script that adds useful skills.
|
||||||
|
|
||||||
|
### Run the Seed Script
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd /opt/ai-skills-api
|
||||||
|
python examples/seed-data.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This adds:
|
||||||
|
- D&D campaign management skills
|
||||||
|
- Infrastructure/Docker skills
|
||||||
|
- Code review skills
|
||||||
|
- General best practices
|
||||||
|
|
||||||
|
### Add Custom Skills
|
||||||
|
|
||||||
|
#### Via API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://helm:8675/skills \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"id": "my-skill",
|
||||||
|
"name": "My Custom Skill",
|
||||||
|
"category": "custom",
|
||||||
|
"content": "Specific instructions for your agent...",
|
||||||
|
"tags": ["keyword1", "keyword2"]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Via MCP (Claude Desktop):
|
||||||
|
|
||||||
|
Use the `skills/create_skill` tool directly in Claude.
|
||||||
|
|
||||||
|
#### Via Python:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
resp = httpx.post(
|
||||||
|
"http://helm:8675/skills",
|
||||||
|
json={
|
||||||
|
"id": "unique-skill-id",
|
||||||
|
"name": "Skill Name",
|
||||||
|
"category": "category",
|
||||||
|
"content": "Full skill instructions...",
|
||||||
|
"tags": ["tag1", "tag2"]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Add Project Conventions
|
||||||
|
|
||||||
|
Conventions are project-specific (tied to a project path or identifier):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://helm:8675/conventions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"name": "My Project Conventions",
|
||||||
|
"project": "/home/user/myproject",
|
||||||
|
"content": "Project-specific coding standards, workflows, etc."
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Your Setup
|
||||||
|
|
||||||
|
### 1. Test RAG Context
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl "http://helm:8675/context/rag?query=docker compose&project=test"
|
||||||
|
```
|
||||||
|
|
||||||
|
Should return JSON with `skills`, `conventions`, `snippets` arrays.
|
||||||
|
|
||||||
|
### 2. Test Compression
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://helm:8675/compress \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "Hello!"},
|
||||||
|
{"role": "assistant", "content": "Hi there! How can I help?"},
|
||||||
|
{"role": "user", "content": "Tell me about Docker."},
|
||||||
|
{"role": "assistant", "content": "Docker is a containerization platform..."}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
Should return compressed messages and `tokens_saved`.
|
||||||
|
|
||||||
|
### 3. Test Memory
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X POST http://helm:8675/memory \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"project": "test",
|
||||||
|
"key": "decision-123",
|
||||||
|
"content": "We decided to use FastAPI for this project"
|
||||||
|
}'
|
||||||
|
|
||||||
|
curl "http://helm:8675/memory?project=test"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Test from Agent Template
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/projects/my-agent
|
||||||
|
docker compose up -d
|
||||||
|
docker compose logs -f agent # Watch the agent start and interact
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Service Won't Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check logs
|
||||||
|
docker compose logs ai-skills-api
|
||||||
|
|
||||||
|
# Common issues:
|
||||||
|
# - Port 8675 already in use: change port in docker-compose.yml
|
||||||
|
# - Permissions: ensure /opt/ai-skills-api is readable
|
||||||
|
```
|
||||||
|
|
||||||
|
### Ollama Not Pulling Model
|
||||||
|
|
||||||
|
The entrypoint script auto-pulls `phi3:mini` if compression strategy is `ollama`. To force:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker compose exec ai-skills-api ollama pull phi3:mini
|
||||||
|
```
|
||||||
|
|
||||||
|
### Can't Connect from Other Machines
|
||||||
|
|
||||||
|
- Ensure `helm` is reachable on the network (ping `helm`)
|
||||||
|
- Check Docker network: `docker network ls` (should have `ai-skills-api_default`)
|
||||||
|
- API is bound to `0.0.0.0:8675` inside container - accessible from host and other containers
|
||||||
|
|
||||||
|
### Auth Errors
|
||||||
|
|
||||||
|
- If you get 401, either disable auth in `config.yaml` or set `API_KEY` in your agent's `.env`
|
||||||
|
- Verify: `curl -H "X-API-Key: your-key" http://helm:8675/skills`
|
||||||
|
|
||||||
|
### High RAG Latency (>10ms)
|
||||||
|
|
||||||
|
- First request after startup will be slower (warming cache)
|
||||||
|
- Subsequent queries should be <5ms
|
||||||
|
- If still slow, check embedding model load: `docker compose logs ai-skills-api`
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Read `USAGE.md` for detailed integration patterns and best practices
|
||||||
|
- Use the template repo for all new agent projects
|
||||||
|
- Add project-specific skills and conventions as you work
|
||||||
|
- Monitor logs for token savings
|
||||||
522
USAGE.md
Normal file
522
USAGE.md
Normal file
|
|
@ -0,0 +1,522 @@
|
||||||
|
# Usage Guide: AI Skills API
|
||||||
|
|
||||||
|
This guide explains how to use the AI Skills API effectively in your projects and AI agent sessions.
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
1. [Understanding the Integration Pattern](#understanding-the-integration-pattern)
|
||||||
|
2. [RAG Context Retrieval](#rag-context-retrieval)
|
||||||
|
3. [Conversation Compression](#conversation-compression)
|
||||||
|
4. [Project Memory](#project-memory)
|
||||||
|
5. [Session Workflow](#session-workflow)
|
||||||
|
6. [Managing Skills](#managing-skills)
|
||||||
|
7. [Token Accounting](#token-accounting)
|
||||||
|
8. [Best Practices](#best-practices)
|
||||||
|
9. [Example Implementations](#example-implementations)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Understanding the Integration Pattern
|
||||||
|
|
||||||
|
The API provides three core capabilities that work together:
|
||||||
|
|
||||||
|
1. **RAG (Retrieval-Augmented Generation)**: Before each LLM call, fetch relevant skills, conventions, and snippets based on your query. This injects relevant context without sending your entire knowledge base every time.
|
||||||
|
|
||||||
|
2. **Compression**: When conversation history grows long (>10 turns), compress old messages into summaries to stay within context windows.
|
||||||
|
|
||||||
|
3. **Memory**: Store decisions, configurations, and learnings per project for future reference.
|
||||||
|
|
||||||
|
**Expected savings**: 60-80% token reduction vs. sending everything.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RAG Context Retrieval
|
||||||
|
|
||||||
|
### The `/context/rag` Endpoint
|
||||||
|
|
||||||
|
This is your primary integration point. It returns only the most relevant items from your knowledge base.
|
||||||
|
|
||||||
|
**Request:**
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /context/rag?query={query}&project={project}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"skills": [
|
||||||
|
{
|
||||||
|
"id": "homelab-docker-compose",
|
||||||
|
"name": "Docker Compose Standard",
|
||||||
|
"category": "homelab",
|
||||||
|
"content": "Always use docker-compose v3.8+. Include health checks, restart policies, and resource limits.",
|
||||||
|
"relevance_score": 0.89
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"conventions": [
|
||||||
|
{
|
||||||
|
"id": "conv-123",
|
||||||
|
"name": "React Project Standards",
|
||||||
|
"project": "/home/user/my-react-app",
|
||||||
|
"content": "Use TypeScript, React 18+, and functional components with hooks.",
|
||||||
|
"relevance_score": 0.76
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"snippets": [
|
||||||
|
{
|
||||||
|
"id": "snippet-456",
|
||||||
|
"name": "FastAPI CORS setup",
|
||||||
|
"language": "python",
|
||||||
|
"content": "app.add_middleware(CORSMiddleware, allow_origins=[\"*\"], ...)",
|
||||||
|
"relevance_score": 0.82
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### How It Works
|
||||||
|
|
||||||
|
- Skills are globally available (your general knowledge base)
|
||||||
|
- Conventions are scoped to a project path or identifier (e.g., `/home/user/project1`)
|
||||||
|
- Snippets are globally available code examples
|
||||||
|
- Relevance scores are cosine similarity (0-1) - items below 0.3 are typically filtered out
|
||||||
|
- Limits are configurable (default: 3 skills, 2 conventions, 2 snippets)
|
||||||
|
|
||||||
|
### Usage Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
async def query_with_context(query: str, project: str = None):
|
||||||
|
# 1. Fetch context
|
||||||
|
context = await get_context(query, project)
|
||||||
|
|
||||||
|
# 2. Build system prompt
|
||||||
|
system_prompt = format_context(context)
|
||||||
|
# system_prompt now contains:
|
||||||
|
# ## Relevant Skills
|
||||||
|
# ### Docker Compose Standard (relevance: 0.89)
|
||||||
|
# Always use docker-compose v3.8+...
|
||||||
|
# ...
|
||||||
|
|
||||||
|
# 3. Inject into LLM call
|
||||||
|
messages = [
|
||||||
|
{"role": "system", "content": system_prompt},
|
||||||
|
{"role": "user", "content": query}
|
||||||
|
]
|
||||||
|
response = await llm.chat(messages)
|
||||||
|
|
||||||
|
return response
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Conversation Compression
|
||||||
|
|
||||||
|
### The `/compress` Endpoint
|
||||||
|
|
||||||
|
Compresses a list of conversation messages into a shorter representation.
|
||||||
|
|
||||||
|
**Request:**
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "Hello!"},
|
||||||
|
{"role": "assistant", "content": "Hi! How can I help?"},
|
||||||
|
{"role": "user", "content": "I need to set up Docker Compose."},
|
||||||
|
{"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."},
|
||||||
|
... (up to 20+ messages)
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"messages": [
|
||||||
|
{"role": "system", "content": "Summary of earlier conversation..."},
|
||||||
|
{"role": "user", "content": "I need to set up Docker Compose."},
|
||||||
|
{"role": "assistant", "content": "Sure! Docker Compose uses a YAML file..."}
|
||||||
|
],
|
||||||
|
"tokens_saved": 245
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Compression Strategies
|
||||||
|
|
||||||
|
- **Extractive** (default): Uses LSA summarization to select key sentences. Fast (~100-500ms), no model required.
|
||||||
|
- **Ollama**: Uses `phi3:mini` for abstractive summaries. Better quality but slower (~2s). Requires Ollama running.
|
||||||
|
|
||||||
|
**Configure in `config.yaml`:**
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
compression:
|
||||||
|
enabled: true
|
||||||
|
strategy: "extractive" # or "ollama"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
conversation = []
|
||||||
|
|
||||||
|
async def chat(query):
|
||||||
|
# Add user message
|
||||||
|
conversation.append({"role": "user", "content": query})
|
||||||
|
|
||||||
|
# Call LLM (with context from RAG)
|
||||||
|
response = await llm.chat(conversation)
|
||||||
|
conversation.append({"role": "assistant", "content": response})
|
||||||
|
|
||||||
|
# Compress when conversation gets long
|
||||||
|
if len(conversation) >= 10:
|
||||||
|
compressed = await compress_messages(conversation)
|
||||||
|
conversation = compressed["messages"]
|
||||||
|
print(f"Saved {compressed['tokens_saved']} tokens")
|
||||||
|
|
||||||
|
return response
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important**: Keep the most recent ~4-6 turns uncompressed. The compression endpoint preserves recent messages and compresses only the older ones.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Project Memory
|
||||||
|
|
||||||
|
### The `/memory` Endpoints
|
||||||
|
|
||||||
|
Store and retrieve project-specific knowledge.
|
||||||
|
|
||||||
|
**Store:**
|
||||||
|
|
||||||
|
```
|
||||||
|
POST /memory
|
||||||
|
{
|
||||||
|
"project": "my-project",
|
||||||
|
"key": "architecture-decision-2024-01-15",
|
||||||
|
"content": "We chose FastAPI over Flask for async support and automatic OpenAPI docs."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Retrieve:**
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /memory?project=my-project
|
||||||
|
```
|
||||||
|
|
||||||
|
**Update:**
|
||||||
|
|
||||||
|
```
|
||||||
|
PUT /memory/{id}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Delete:**
|
||||||
|
|
||||||
|
```
|
||||||
|
DELETE /memory/{id}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Usage Pattern
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Store a decision after making it
|
||||||
|
await store_memory(
|
||||||
|
project="/home/user/myapp",
|
||||||
|
key="db-choice",
|
||||||
|
content="Using PostgreSQL over MongoDB for relational data integrity"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Retrieve past decisions at project start
|
||||||
|
resp = httpx.get("http://helm:8675/memory", params={"project": "/home/user/myapp"})
|
||||||
|
decisions = resp.json()["entries"]
|
||||||
|
# decisions = [{"id": "...", "key": "db-choice", "content": "...", ...}]
|
||||||
|
```
|
||||||
|
|
||||||
|
**When to use memory:**
|
||||||
|
- Architecture decisions
|
||||||
|
- Configuration choices (API keys, service URLs)
|
||||||
|
- Learned preferences ("User likes code examples")
|
||||||
|
- Debugging notes ("Issue with CORS on port 8080")
|
||||||
|
|
||||||
|
**When NOT to use memory:**
|
||||||
|
- Temporary conversation state (use compression instead)
|
||||||
|
- Large codebases (store in skills/snippets instead)
|
||||||
|
- Public documentation (should be in skills)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Session Workflow
|
||||||
|
|
||||||
|
### Starting a New Session
|
||||||
|
|
||||||
|
1. **Define your project identifier** - a path or unique string:
|
||||||
|
```python
|
||||||
|
PROJECT = "/home/user/myapp" # or "my-discord-bot", "workspace-123"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Load past memories** (optional but helpful):
|
||||||
|
```python
|
||||||
|
memories = httpx.get("http://helm:8675/memory", params={"project": PROJECT}).json()["entries"]
|
||||||
|
# Inject into system prompt or create context from them
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Begin conversation loop** - for each user query:
|
||||||
|
- Call `GET /context/rag?query=...&project=PROJECT`
|
||||||
|
- Inject context into LLM prompt
|
||||||
|
- Call LLM
|
||||||
|
- Store important outputs in memory if they represent decisions/learnings
|
||||||
|
- Compress conversation when it reaches ~10 turns
|
||||||
|
|
||||||
|
### Ending a Session
|
||||||
|
|
||||||
|
- Optionally store a session summary in memory:
|
||||||
|
```python
|
||||||
|
await store_memory(PROJECT, "session-summary-2024-01-15", "Completed user auth flow, decided on JWT tokens")
|
||||||
|
```
|
||||||
|
|
||||||
|
- No cleanup needed - conversation state lives in your agent, not the server
|
||||||
|
|
||||||
|
### Multi-Project Agents
|
||||||
|
|
||||||
|
If your agent works across multiple projects:
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Switch project context mid-conversation
|
||||||
|
PROJECT = "/home/user/project1" # current active project
|
||||||
|
|
||||||
|
# Each project has its own conventions and memories
|
||||||
|
context = await get_context(query, project=PROJECT)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Managing Skills
|
||||||
|
|
||||||
|
Skills are your reusable knowledge base. Manage them via API, MCP, or the seed script.
|
||||||
|
|
||||||
|
### Categories
|
||||||
|
|
||||||
|
Group skills by category (e.g., `homelab`, `dnd`, `python`, `devops`). Categories don't affect RAG retrieval but help with organization.
|
||||||
|
|
||||||
|
### Tags
|
||||||
|
|
||||||
|
Tags are keywords used for **future search** (not currently used by RAG, but planned for enhanced filtering).
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"tags": ["docker", "compose", "infrastructure", "production"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Best Practices for Skills
|
||||||
|
|
||||||
|
- **Be specific**: "Docker Compose Production Patterns" > "Docker"
|
||||||
|
- **Include examples**: Show code snippets in the content
|
||||||
|
- **Keep it concise**: 1-3 paragraphs, focus on actionable guidance
|
||||||
|
- **Use markdown**: The API preserves formatting for injection into prompts
|
||||||
|
- **Version when updating**: If a skill changes significantly, create a new `id` (e.g., `docker-compose-v2`)
|
||||||
|
|
||||||
|
### Search Skills
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /skills/search?q={query}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns matching skills by name/content similarity. Useful for manual exploration but not needed in automated agents (use `/context/rag` instead).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Token Accounting
|
||||||
|
|
||||||
|
### Count Tokens
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /tokens/count?text={text}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns the token count (using tiktoken for GPT models, approximations for others).
|
||||||
|
|
||||||
|
**Use this to:**
|
||||||
|
- Track compression savings
|
||||||
|
- Pre-flight check prompts before sending to LLM
|
||||||
|
- Budget token usage per session
|
||||||
|
|
||||||
|
### Example: Measure RAG Savings
|
||||||
|
|
||||||
|
```python
|
||||||
|
full_context = load_all_skills() # hypothetical: all your skills text
|
||||||
|
full_tokens = count_tokens(full_context)
|
||||||
|
|
||||||
|
rag_context = get_context(query, project) # only relevant items
|
||||||
|
rag_tokens = count_tokens(format_context(rag_context))
|
||||||
|
|
||||||
|
savings_pct = (1 - rag_tokens / full_tokens) * 100
|
||||||
|
print(f"RAG saved {savings_pct:.1f}% tokens")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### 1. Always Use Project Scoping
|
||||||
|
|
||||||
|
Set `project` parameter consistently. Even if you have one main project, use a consistent identifier:
|
||||||
|
|
||||||
|
```python
|
||||||
|
PROJECT = "/home/user/myapp" # NOT "default" or None
|
||||||
|
context = await get_context(query, project=PROJECT)
|
||||||
|
```
|
||||||
|
|
||||||
|
This allows:
|
||||||
|
- Project-specific conventions
|
||||||
|
- Memory isolation between projects
|
||||||
|
- Future per-project analytics
|
||||||
|
|
||||||
|
### 2. Call RAG Before Every LLM Request
|
||||||
|
|
||||||
|
Even if the query seems unrelated, the cost is negligible (<5ms, ~50 tokens). The knowledge injected often improves responses.
|
||||||
|
|
||||||
|
### 3. Compress Proactively
|
||||||
|
|
||||||
|
Don't wait until context window is full. Compress at ~10 messages:
|
||||||
|
|
||||||
|
```python
|
||||||
|
if len(conversation) >= 10:
|
||||||
|
compressed = await compress_messages(conversation)
|
||||||
|
conversation = compressed["messages"]
|
||||||
|
```
|
||||||
|
|
||||||
|
This keeps the compression quality high (summaries are more accurate with fewer messages).
|
||||||
|
|
||||||
|
### 4. Store Learnings, Not Everything
|
||||||
|
|
||||||
|
Memory is for **decisions** and **facts you want to recall**.
|
||||||
|
|
||||||
|
Don't store:
|
||||||
|
- Every user query/response (that's what compression is for)
|
||||||
|
- Public documentation (put in skills instead)
|
||||||
|
- Transient state (keep in agent memory)
|
||||||
|
|
||||||
|
### 5. Version Your Skills
|
||||||
|
|
||||||
|
When a skill's guidance changes:
|
||||||
|
|
||||||
|
- **Minor update** (typo, clarification): update the existing skill's `content` in place
|
||||||
|
- **Major update** (different approach, breaking change): create a new `id` (e.g., `docker-compose-v2`) and optionally mark the old one as deprecated in its content
|
||||||
|
|
||||||
|
### 6. Use MCP in Claude Desktop
|
||||||
|
|
||||||
|
If you use Claude Desktop, add the MCP server (see `CLAUDE.md`). This gives you:
|
||||||
|
- Direct access to skills via Claude's tool calling
|
||||||
|
- No need to implement API calls manually
|
||||||
|
- Same token savings within Claude
|
||||||
|
|
||||||
|
### 7. Monitor Token Savings
|
||||||
|
|
||||||
|
Track metrics:
|
||||||
|
|
||||||
|
```python
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
logs = []
|
||||||
|
|
||||||
|
def log_savings(tokens_before, tokens_after, operation):
|
||||||
|
logs.append({
|
||||||
|
"timestamp": datetime.now().isoformat(),
|
||||||
|
"operation": operation,
|
||||||
|
"tokens_before": tokens_before,
|
||||||
|
"tokens_after": tokens_after,
|
||||||
|
"savings": tokens_before - tokens_after
|
||||||
|
})
|
||||||
|
# Periodically upload or analyze these
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Example Implementations
|
||||||
|
|
||||||
|
### Minimal Agent
|
||||||
|
|
||||||
|
```python
|
||||||
|
import asyncio, httpx, os
|
||||||
|
|
||||||
|
API_URL = os.getenv("API_URL", "http://helm:8675")
|
||||||
|
PROJECT = os.getenv("PROJECT", "/default")
|
||||||
|
|
||||||
|
async def get_context(query):
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
resp = await client.get(f"{API_URL}/context/rag", params={"query": query, "project": PROJECT})
|
||||||
|
return resp.json()
|
||||||
|
|
||||||
|
async def chat():
|
||||||
|
conv = []
|
||||||
|
while True:
|
||||||
|
query = input("You: ")
|
||||||
|
if query == "quit": break
|
||||||
|
|
||||||
|
# Get context
|
||||||
|
ctx = await get_context(query)
|
||||||
|
system = format_context(ctx)
|
||||||
|
|
||||||
|
# Call LLM (pseudo)
|
||||||
|
response = call_llm(system, conv[-4:], query)
|
||||||
|
|
||||||
|
conv.extend([{"role": "user", "content": query},
|
||||||
|
{"role": "assistant", "content": response}])
|
||||||
|
|
||||||
|
print(f"Assistant: {response}")
|
||||||
|
|
||||||
|
asyncio.run(chat())
|
||||||
|
```
|
||||||
|
|
||||||
|
### Discord Bot with Context
|
||||||
|
|
||||||
|
```python
|
||||||
|
import discord
|
||||||
|
from discord.ext import commands
|
||||||
|
import httpx
|
||||||
|
|
||||||
|
bot = commands.Bot(command_prefix="!")
|
||||||
|
API_URL = "http://helm:8675"
|
||||||
|
PROJECT = "/home/user/discord-bot"
|
||||||
|
|
||||||
|
@bot.event
|
||||||
|
async def on_message(message):
|
||||||
|
if message.author == bot.user:
|
||||||
|
return
|
||||||
|
|
||||||
|
# RAG context
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
resp = await client.get(f"{API_URL}/context/rag", params={"query": message.content, "project": PROJECT})
|
||||||
|
ctx = resp.json()
|
||||||
|
|
||||||
|
# Build prompt
|
||||||
|
system_prompt = format_context(ctx) + "\n\nYou are a helpful Discord bot."
|
||||||
|
|
||||||
|
# Respond (using your LLM of choice)
|
||||||
|
response = await generate_response(message.content, system_prompt)
|
||||||
|
await message.reply(response)
|
||||||
|
|
||||||
|
# Store in memory if it's a decision
|
||||||
|
if "decision" in message.content.lower():
|
||||||
|
async with httpx.AsyncClient() as client:
|
||||||
|
await client.post(f"{API_URL}/memory", json={
|
||||||
|
"project": PROJECT,
|
||||||
|
"key": f"decision-{discord.utils.utcnow().timestamp()}",
|
||||||
|
"content": response[:500]
|
||||||
|
})
|
||||||
|
|
||||||
|
bot.run(os.getenv("DISCORD_TOKEN"))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Need More Help?
|
||||||
|
|
||||||
|
- **Setup issues**: See `SETUP.md`
|
||||||
|
- **Template repo**: Clone `git.bouncypixel.com:helm/agentic-templates.git`
|
||||||
|
- **API reference**: Visit `http://helm:8675/docs` when the service is running
|
||||||
|
- **MCP tools**: See `CLAUDE.md` for Claude Desktop integration
|
||||||
|
|
@ -28,3 +28,18 @@ services:
|
||||||
interval: 30s
|
interval: 30s
|
||||||
timeout: 10s
|
timeout: 10s
|
||||||
retries: 3
|
retries: 3
|
||||||
|
|
||||||
|
mcp:
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: mcp/Dockerfile
|
||||||
|
command: python skills.py
|
||||||
|
ports:
|
||||||
|
- "3000:3000"
|
||||||
|
environment:
|
||||||
|
- SKILLS_API_URL=http://api:8080
|
||||||
|
- MCP_TRANSPORT=sse
|
||||||
|
- MCP_PORT=3000
|
||||||
|
depends_on:
|
||||||
|
- api
|
||||||
|
restart: unless-stopped
|
||||||
|
|
|
||||||
|
|
@ -3,3 +3,4 @@ httpx==0.26.0
|
||||||
python-dotenv==1.0.0
|
python-dotenv==1.0.0
|
||||||
docker==7.0.0
|
docker==7.0.0
|
||||||
psutil==5.9.7
|
psutil==5.9.7
|
||||||
|
uvicorn[standard]==0.27.0
|
||||||
|
|
|
||||||
|
|
@ -1,6 +1,7 @@
|
||||||
from mcp.server.fastmcp import FastMCP
|
from mcp.server.fastmcp import FastMCP
|
||||||
import httpx
|
import httpx
|
||||||
import os
|
import os
|
||||||
|
import uvicorn
|
||||||
|
|
||||||
mcp = FastMCP("skills")
|
mcp = FastMCP("skills")
|
||||||
|
|
||||||
|
|
@ -162,4 +163,11 @@ def create_skill(
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
mcp.run()
|
transport = os.getenv("MCP_TRANSPORT", "stdio")
|
||||||
|
|
||||||
|
if transport == "sse":
|
||||||
|
host = os.getenv("MCP_HOST", "0.0.0.0")
|
||||||
|
port = int(os.getenv("MCP_PORT", "3000"))
|
||||||
|
mcp.run_sse(host=host, port=port)
|
||||||
|
else:
|
||||||
|
mcp.run()
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue