No description

Find a file

Lukas Parsons 9e3d5ae550 MCP: coach AI to ask for project ID at session start; add project-setup-guide seed skill		2026-03-23 00:40:13 -04:00
examples	MCP: coach AI to ask for project ID at session start; add project-setup-guide seed skill	2026-03-23 00:40:13 -04:00
mcp	MCP: coach AI to ask for project ID at session start; add project-setup-guide seed skill	2026-03-23 00:40:13 -04:00
template	Add Ollama service to docker-compose, expand seed skills with D&D and monitoring, create entrypoint for auto-model-pull	2026-03-22 22:41:49 -04:00
.env.example	Initial commit: Skills API with MCP servers	2026-03-22 21:18:23 -04:00
.gitignore	Initial commit: Skills API with MCP servers	2026-03-22 21:18:23 -04:00
CLAUDE.md	Change API port from 8080 to 8675 across all configs and docs	2026-03-22 21:54:51 -04:00
compression.py	Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON	2026-03-22 22:32:44 -04:00
config.py	Fix config loading to return proper dataclass objects instead of dicts	2026-03-22 23:21:31 -04:00
config.yaml	Add Ollama service to docker-compose, expand seed skills with D&D and monitoring, create entrypoint for auto-model-pull	2026-03-22 22:41:49 -04:00
database.py	Initial commit: Skills API with MCP servers	2026-03-22 21:18:23 -04:00
docker-compose.yml	Add SSE MCP server, comprehensive docs, and OpenCode integration	2026-03-22 23:59:33 -04:00
Dockerfile	Increase pip timeout to 100s for slow network builds	2026-03-23 00:03:54 -04:00
entrypoint.sh	Add Ollama service to docker-compose, expand seed skills with D&D and monitoring, create entrypoint for auto-model-pull	2026-03-22 22:41:49 -04:00
main.py	Fix compress endpoint to use request.messages correctly	2026-03-22 22:47:49 -04:00
models.py	Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON	2026-03-22 22:32:44 -04:00
OPENCODE-MCP.md	Add SSE MCP server, comprehensive docs, and OpenCode integration	2026-03-22 23:59:33 -04:00
rag.py	Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON	2026-03-22 22:32:44 -04:00
README.md	docs: update project scoping to use git remote for cross-machine consistency	2026-03-23 00:33:26 -04:00
requirements.txt	Major refactor: remove semantic cache, add config, auth, improve RAG performance, fix tags JSON	2026-03-22 22:32:44 -04:00
schemas.py	Fix compression endpoint request validation and message schema	2026-03-22 22:47:07 -04:00
SETUP.md	docs: update project scoping to use git remote for cross-machine consistency	2026-03-23 00:33:26 -04:00
TOKEN-SAVING-PATTERN.md	Update MCP server (remove cache tool), fix readme endpoints, add template reference	2026-03-22 22:35:02 -04:00
USAGE.md	docs: update project scoping to use git remote for cross-machine consistency	2026-03-23 00:33:26 -04:00

README.md

AI Skills API

Local infrastructure for AI context management. Reduce token consumption by 60-80% through smart RAG, conversation compression, and reusable skills.

API available at: http://helm:8675
Interactive docs: http://helm:8675/docs

Quick Links

Setup Guide - One-time deployment on your server
Usage Guide - How to integrate with your agents
Template Repository - Starter kit for new projects

Key Features

Smart RAG: Pre-computed embeddings, <5ms retrieval, returns only relevant skills/snippets
Conversation Compression: Extractive summarization or Ollama (phi-3-mini) - saves 50-75% on history
Project Memory: Store decisions and learnings per project
Simple API: RESTful JSON API + MCP server for Claude Desktop
Zero-friction auth: Optional API key (set-and-forget)

Quick Start (5 minutes)

# 1. Deploy the service on helm (see SETUP.md for details)
docker compose up -d

# 2. Clone the template repo for your agent project
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
cp .env.example .env
docker compose up -d

# 3. Your agent is now running with context management

See SETUP.md for complete deployment instructions and USAGE.md for integration patterns.

Endpoints

Endpoint	Description	Auth
`GET /health`	Health check	No
`GET /config`	Show current config	Yes
`GET /skills`	List all skills	Yes
`GET /skills/{id}`	Get skill (increments usage)	Yes
`POST /skills`	Create skill	Yes
`PUT /skills/{id}`	Update skill	Yes
`DELETE /skills/{id}`	Delete skill	Yes
`GET /skills/search?q=query`	Search skills	Yes
`GET /snippets`	List snippets	Yes
`POST /snippets`	Create snippet	Yes
`DELETE /snippets/{id}`	Delete snippet	Yes
`GET /conventions`	List conventions	Yes
`GET /conventions?project=/path`	Get project conventions	Yes
`POST /conventions`	Create convention	Yes
`DELETE /conventions/{id}`	Delete convention	Yes
`GET /memory`	List memory entries	Yes
`GET /memory?project=name`	Get project memory	Yes
`POST /memory`	Create memory entry	Yes
`PUT /memory/{id}`	Update memory	Yes
`DELETE /memory/{id}`	Delete memory	Yes
`GET /context/rag?query=...`	RAG context (smart retrieval)	Yes
`POST /compress`	Compress conversation	Yes
`GET /tokens/count?text=...`	Count tokens	Yes
`POST /admin/clear-cache`	Clear RAG cache	Yes

Note: Endpoints marked "Yes" require API key if auth is enabled (default: disabled).

Integration Pattern

import httpx

async def query_llm(prompt, conversation_history, project=None):
    # 1. Get relevant context (RAG) - biggest token saver
    context = await httpx.get(
        "http://helm:8675/context/rag",
        params={"query": prompt, "project": project}
    ).json()
    
    # Inject context into your LLM prompt
    system_prompt = f"{context['skills']}\n{context['conventions']}"
    
    # 2. Call LLM with context + conversation
    response = call_llm(system_prompt, conversation_history, prompt)
    
    # 3. Store learnings in memory
    await httpx.post(
        "http://helm:8675/memory",
        json={"project": project, "key": "decision", "content": response}
    )
    
    # 4. Periodically compress old conversation turns
    if len(conversation_history) > 10:
        await httpx.post(
            "http://helm:8675/compress",
            json={"messages": conversation_history}
        )
    
    return response

Expected savings: 60-80% token reduction vs. sending everything.

See USAGE.md for complete integration patterns, examples, and best practices.

Template Repository

Want to get started quickly? Use the agent template:

# Clone the template
git clone git.bouncypixel.com:helm/agentic-templates.git my-agent
cd my-agent
cp .env.example .env
docker compose up -d

The template includes a working agent integration and docker-compose setup. See USAGE.md for integration patterns.

How It Works (Architecture)

RAG Engine (Fast)

All skills/snippets are loaded into memory at startup with pre-computed embeddings
Queries embed once, compute cosine similarity against cached embeddings
Returns top-K most relevant items (<5ms for 1000 items)
No external API calls, no database queries per request

Compression (Configurable)

Extractive (default): Uses LSA summarization to pick key sentences - fast, no model
Ollama: Sends to local phi-3-mini for high-quality summaries (~2s)
Keeps recent turns full, replaces old with summary

Memory Store

Simple key-value per project
Stores decisions, configurations, learnings
Retrieved via /memory?project=...

Project scoping: Use a stable identifier for the project parameter (e.g., git remote URL like https://github.com/username/repo). This ensures your project's conventions and memories follow you across machines, even if file paths differ. The template agent auto-detects the git remote.

MCP Server Integration

If you use Claude Desktop, add to your config:

{
  "mcpServers": {
    "skills": {
      "command": "python",
      "args": ["/path/to/ai-skills-api/mcp/skills.py"],
      "env": {
        "SKILLS_API_URL": "http://helm:8675"
      }
    }
  }
}

Available tools:

search_skills, get_skill, list_skills
get_context, get_conventions, get_snippets
get_memory, add_memory, create_skill

Auto-coaching: The MCP server sends instructions to the AI on connection, teaching it how and when to use these tools to learn and improve over time.

Important: The AI will propose creating skills/memories when it identifies valuable patterns, but will not execute without your permission. You'll see suggestions like "I could create a skill for this pattern. Should I?" and you can approve or decline. This gives you full control while still building the knowledge base.

Migration from v1

If you were using the old semantic cache:

Deleted: Semantic cache endpoints and model
Migrate: Any stored skills/snippets remain (tags now JSON)
Upgrade: Pull new image, restart, optionally enable auth

Performance

RAG latency: ~5ms (cached embeddings)
Embedding model load: ~100MB RAM, ~2s cold start
Compression: 100-500ms (extractive) or ~2s (ollama)
Supports 1000+ skills/snippets without degradation

License

MIT

For detailed usage examples and API reference, see USAGE.md and the interactive docs at http://helm:8675/docs when the service is running.